Modelling the probability of building fires

Systematic spatial risk analysis plays a crucial role in preventing emergencies. In the Czech Republic, risk mapping is currently based on the risk accumulation principle, area vulnerability, and preparedness levels of Integrated Rescue System components. Expert estimates are used to determine risk levels for individual hazard types, while statistical modelling based on data from actual incidents and their possible causes is not used. Our model study, conducted in cooperation with the Fire Rescue Service of the Czech Republic as a model within the Liberec and Hradec Králové regions, presents an analytical procedure leading to the creation of building fire probability maps based on recent incidents in the studied areas and on building parameters. In order to estimate the probability of building fires, a prediction model based on logistic regression was used. Probability of fire calculated by means of model parameters and attributes of specific buildings can subsequently be visualized in probability maps.


Introduction
Emergencies, whether caused by forces of nature or human activity, have accompanied humanity throughout its entire history.Nevertheless, developed societies endeavour to prevent emergencies and mitigate their negative effects.A key role in these efforts is played by systematic spatial risk analysis, an integral part of preventive security measures used in crisis and emergency planning.Such analyses take advantage of possibilities currently offered by modern technology and geographic information systems (GISs) along with the availability of many suitable mapping resources.Risk analysis therefore enables efficient readiness planning for Integrated Rescue System units and more rapid responses to emergencies when they occur.That, in turn, contributes to improved protection for the population.
In other countries, exceptional attention is devoted to developing measures to map risks in order to protect inhabitants, although the procedures in individual countries differ in the depths of analysis used as well as in the extent of applying the results of those analyses [15].In Finland, for example, detailed risk mapping is required by law and is carried out through uniform procedures at the level of municipalities.A basic GIS application was developed for these purposes [11], and it is being further elaborated through academic research (e.g.[1], [12], [18]).Attention is focused mainly on input values with the objective of creating a realistic risk model corresponding to actual emergencies.Recent records on conducted interventions are therefore used in this development.Significant predictors are determined using sophisticated methods of spatial analysis which also consider how the studied phenomena vary over time.
In the Czech Republic, risk mapping is currently based on a methodology developed by a team at the Department for Population Protection and Emergency Management of the Fire Rescue Service (FRS) of the Moravian-Silesian Region [13] based on a methodology recommended by the European Union.The methodology for creating risk maps is based on the risk accumulation principle, area vulnerability, and preparedness levels of Integrated Rescue System components in order to minimize emergencies' negative impacts.The resulting risk maps represent existing risk levels in the given territory expressed as values between 0 and 1.To determine risk rates for individual hazard types, expert estimates based on statistical data and the expert team's experience are used.The methodology therefore does not use current spatial data on actual interventions by the FRS, which could provide detailed information on the events' distribution across space and time.Although the resulting risk maps therefore present theoretical values of accumulated risk, they are static and do not account for either actual events' constantly changing spatial pattern or their possible causes.
As seen in existing research focused on emergency management ( [5], [6], [8], [9], [19]), when mapping risks it is necessary to distinguish individual hazard types and determine risk levels separately for each type [13].Research usually focuses on such phenomena as traffic accidents ( [3], [14], [21]), forest fires ( [4], [7], [17]), and building fires ( [19], [20], [22], [23]).The main objective of our study was to contribute to current risk mapping methodology by preparing and testing an analytical procedure that would identify the factual context of building fires based on actual incidents recorded by FRS units.The objective was therefore to find and verify a suitable procedure beginning with input data quality analysis and continuing through GIS analysis and statistical modelling.A partial objective was to determine which building attributes are significant in fire incidents and are suitable for consideration as predictors when creating building fire probability maps, or whether such detailed attributes can be at least partially replaced by easily accessible data.This model study was performed within the Liberec and Hradec Králové regions and was worked out in cooperation with the FRSs of the two regions.

Building fires data and data quality analysis
Fire incidents data (covering 2010-2012) and the building layer were provided for the purposes of the project by the FRSs of the Liberec and Hradec Králové regions; the building layer originally comes from the Czech Statistical Office and results from the Population and Housing Census 2011.The authors originally intended also to use detailed data on households' socioeconomic characteristics from the same census, but at the time the study was prepared such data was not yet available from the Czech Statistical Office.The analysis only included records on fires in buildings, i.e. records with the attributes "low-rise buildings", "high-rise buildings" and "industrial and agricultural buildings and warehouses".False alarms, tactical trainings, testing trainings, and technology tests were then removed from the selected records according to other attributes, as were inter-regional and international interventions which extended beyond the model territory's borders.
Due to the known fact that the results of an analysis are affected by the quality of the input data (see Shi et al. [16]) and considering possible errors in designating a fire's location made at the operating centre or in the field (not all vehicles are equipped with GPS receivers), the first step was to subject the data to detailed quality analysis to examine their accuracy in terms of location and attributes.Most of the errors determined in the fire records layer were from 2010 and 2011, while data from 2012 had the fewest deficiencies.This demonstrates the effectiveness of changes in data collection methodology implemented by the FRS between these years.After removal or correction of erroneous records, data from 2012 was therefore used for analyses to create probability maps.
Deficiencies in the position component consisted primarily of missing coordinates in certain records, switching of x and y axes, use of multiple coordinate systems, and records' inaccurate localization.Inaccurate localization consisted of two error types: the first type designated the location using only such data as the municipality's name, thus resulting in the intervention's being placed in the centre of the municipality and in most cases also several records with identical intervention positions, and the second inaccurately designated coordinates which would place a building fire outside of a building.Prior to further analysis, records with positions in identical locations where it was verified that the interventions had occurred at various locations and records with a missing positional component were removed, records with switched coordinates were corrected, and the coordinate system was unified to that of the Czech and Slovak system known as S-JTSK.
Deficiencies in the data's attribute component appeared particularly in the building layer, especially due to incorrect or inconsistent completion of attribute values.An example of an obvious error can be seen in the number-of-storeys attribute, according to which most municipalities would contain 13-storey buildings.Attributes (columns) with probably erroneous values were not considered in subsequent analyses to be used as possible predictors.In cases of redundancy among attributes, attributes were selected as possible predictors if they had more completed elements or if they could be more easily interpreted.An example can be seen in a pair of attributes describing the building owner, one of which included 13 levels, while the other, which was selected, only 5.
For purposes of regression modelling, the two data layers needed to be interconnected so that a building was unambiguously assigned to each fire record.The connection was performed as a GIS overlay analysis (using the Spatial Join tool in the ArcGIS 10.2 program).Records within 50 m of the nearest building were assigned to said nearest building, while the remaining records were removed.The connection's result was a layer of buildings containing a binary attribute with information on whether or not a fire occurred in the given building.No situations occurred in which a single building would be assigned to multiple records.The last reduction in the data layer was to remove as potential predictors for the regression model all elements (i.e.buildings) for which complete data was not provided for all of the selected attributes (see the Statistical Analysis section below).The required data filtration was automated using Python scripts.

Factors potentially affecting fire probability
The next step involved analysing factors potentially affecting fire probability and selecting suitable predictors to create probability maps.Basic demographic data about the territory and building attributes were examined.
For potentially significant building attributes, we considered building type (low-rise, high-rise, industrial), house type (apartment, single-family, semi-detached, row), number of apartments (category), supply of gas (yes/no), boiler room presence (yes/no), heating type (in-house, inapartment, remote, other), building age (before 1920, 1920-1945,1946-1960, then categories by decade up to 2000, individual years distinguished after 2000), elevator presence (yes/no), owner (individual, cooperative, municipality, other legal entities).The examined demographic data were spatially related to administrative units of various degrees (municipality, municipality with a delegated municipal office, municipality with extended jurisdiction, districts).The number of inhabitants, territorial unit size, population density, and unemployment rate were considered, given the assumption that a higher fire probability corresponds to a larger population, larger territory, greater population density, and higher unemployment rate.For this part of the analysis, layers from the freely accessible ArcČR500 3.0 database were used.

Statistical analysis
First, correlation between the number of fires and the number of inhabitants, population density, territorial unit size, and unemployment rate were evaluated.As usual for data expressing abundance, the distribution of the number of fires within territorial units differed significantly from the normal distribution and was skewed towards small values.The strength of the relationship between fire incidents and demographic data was therefore evaluated using Spearman's rank correlation coefficient.
To determine significant building attributes, relative fire frequencies within individual building attribute levels were tested, i.e. absolute fire frequency within individual levels was always divided by the absolute frequency of buildings with the given attribute level.Differences between frequencies were tested using a test for homogeneity of several binomial distributions (always as many as the given attribute had levels).In this case, the tested hypothesis is that fire probability is the same within all of the attribute's levels.All hypotheses were tested at a significance level of α = 0.05.These analyses resulted in a list of potential predictorsbuilding attributes that were then entered into fire probability modelling.Subsequent probability map creation was based on a regression model which displays the relationship between fire probability and a combination of building attributes.Considering the binary nature of fire data, we used logistic regression (see Agresti [2]).Only buildings with completed attributes (i.e.ca half of the records) were entered into the model.Based on a simple visualization of omitted and retained buildings, we verified that the omitted buildings' spatial pattern is similar to that of the retained buildings, i.e. that the necessary omission of half of the records would only minimally affect the subsequent spatial expression of fire probability.Predictors which satisfied the logistic model's conditions (i.e. that the logits of the selection's quantitative variables follow an approximately linear trend) and which significantly contributed to model quality were included into the resulting model.Predictors were chosen by means of backward selection, i.e. using successive model simplification.The initial model contained all considered predictors without their interactions.The basic criterion for model selection was the lowest Akaike information criterion value (see Agresti [2] for method details).
To evaluate a model's discriminating power, we used a generally acknowledged summary statistic: the area under the receiver operating characteristic curve.This value is identical to   2: Relationship between the number of fires in low-rise buildings and (a) number of inhabitants and (b) territorial extent (ha) of municipalities with a delegated municipal office so-called concordance, which is the relative frequency of "concordant" pairs, i.e. such pairs where if the first is from those records with event (fire) incidence and the other is from those without it, then the estimated probability is higher for the first than for the second.It is clear that a usable model's concordance value must be substantially greater than 0.5.Hosmer and Lemeshow [10] presented the following interpretations of such values: 0.7-0.8acceptable discrimination; 0.8-0.9excellent discrimination; above 0.9 "outstanding discrimination".
In addition to a discrimination calculation, the resulting model also underwent internal validation performed in 100 iterations.For each iteration, the entry data was divided into two discrete sets: a calibration set (75% of records) upon which the model was fitted, and a validation set (25% of records) upon which the model created from the calibration set was applied.Individual models' parameter values were summarized and thus their stabilities were assessed (i.e.how much their values fluctuated through the iterations performed).In addition to parameter values, concordance was also calculated for each iteration, while doing so separately for its validation and calibration sets.
Table 1: Selection of prediction model for building fire probability.Backward selection was used and the Akaike information criterion (AIC) was selected as its criterion.Individual selection steps are divided by lines.In a given step, the model with the lowest AIC value was selected (selected models are marked in bold).The process was repeated until all AIC values were greater within the current step than the AIC value of the best model from the previous step.The resulting selected model is thus the model selected as the best in the penultimate step, i.e. model 4a.This model was subsequently used to create probability maps.Legend for predictors: v1 -heating, v2 -owner, v3 -elevator, v4 -wall material, v5 -house type, v6 -gas, v7 -category according to number of apartments.

Model Predictors Compared
After model selection and validation, the resulting model was used to predict fire probability in individual buildings.Estimated probability values were added as a new attribute to the attribute table of the building layer, which enabled their subsequent visualization in the form of probability maps.Estimated probabilities were then averaged within territorial units and the results were again visualized in map form.All statistical calculations described in this section were performed using our own scripts in R, while map visualization was via the ArcGIS 10.2 program.

Demographic characteristics and territorial extent
When using Spearman's rank correlation coefficient to analyse associations between the number of fires and demographic characteristics and territorial extent, the selection of suitable territorial units can present a difficulty.In such small units as a municipality there are frequent duplicate fire number values (frequent repetitions of low values).The situation can be resolved by random assignment of rank, but if there are many such values in the tested set, then the result can be considerably distorted (and particularly the correlation coefficient's significance).Meanwhile, excessively large units do not provide sufficiently detailed spatial information.Based on the results of the present study, optimal practice means using the poly- For municipalities with a delegated municipal office, the number of inhabitants had slightly greater correlation coefficients for individual fire types, and for municipalities with extended jurisdictions correlations with territorial extent and with the number of inhabitants are almost identical.The number of fires exhibited no association with population density or unemployment, i.e. with any demographic variable that could potentially be used to predict fires at the level of individual houses.Of the tested variables, the number of inhabitants and the territorial extent are suitable predictors as to the number of fires at the level of municipalities with a delegated municipal office and municipalities with extended jurisdictions.These easily accessible data (territorial unit extent is always available) can therefore be used for simple visualizations of fire hazards.These results are not in the least surprising, however, as they only say that "the more space and potential sources (i.e.people) there are, the more likely fire is".

Prediction models and probability maps
When selecting a model using backward selection (see Table 1), interactions among predictors were not considered.A model that included interactions would be too complex and for many predictors would be prohibitively demanding computationally (which was one of the obstacles for internal validation based on repeated iterations).In addition, interactions among individual variables would be very difficult to interpret and it cannot be assumed that they would be demonstrated to be significant due to the relatively small number of events.
Of the examined building attributes, a combination of house type, number of apartments, and whether a building had an elevator and gas connection best predicted fire probability (see Table 1 describing model selection).The prediction model's resulting parameters are summarized in Table 2.The modest insignificance displayed by the number of apartments (p = 0.07) could lead to doubts as to whether to retain this predictor in the model.However, a predictor's independent significance or insignificance is not the only criterion for retaining a predictor in the model.It is also necessary to account for whether there is a strong reason to believe that a predictor has in fact an influence on the explained variable, which is obvious in the case of number of apartments and fire probability.It can therefore be argued that the "correct" model should be based on individual apartment units rather than entire houses.This is impossible, however, as fire data does not state specific apartment units.In addition, such a model could not include fires in houses' common spaces.Including the number of apartments as a predictor therefore efficiently resolves the problem as to the unequal size of basic spatial units (i.e.houses).Another reason to include the predictor in the model lies in the result of the backward selection of the model, wherein including the predictor led to a significantly better model and therefore higher-quality prediction.Including the predictor of a house having an elevator might seem rather surprising.There is no clear interpretation for this predictor that might offer a causative link to fires.Its inclusion is therefore based only on comparing the model with and without it (see backward selection of the model, Table 1).The probable explanation is that it is a "placeholder" predictor, i.e. a building attribute closely correlating with one or more attributes for which there is a causative link to fires.In backward selection of the model, this predictor also "beat" predictors such as heating method and wall material for which a correlation with fires could be expected.This suggests that its inclusion is a way actually to bring indirectly into the model building properties which influence fire probability but are not directly recorded in the building data.

Figure 1 :
Figure 1: Relationship between the number of fires in low-rise buildings and (a) number of inhabitants and (b) territorial extent (ha) of municipalities with extended jurisdictions

Figure
Figure 2: Relationship between the number of fires in low-rise buildings and (a) number of inhabitants and (b) territorial extent (ha) of municipalities with a delegated municipal office

Figure 3 :
Figure 3: Results of concordance calculations during internal validation.The boxplots summarize 100 iterations in which a random selection of 25% of the building records were always used as the validation set and the remaining 75% was used to fit the model.The resulting model was subsequently used in predictions on the validation set.Subsequently, concordance was calculated for both the validation and calibration sets.

Figure 4 :
Figure 4: Section of the probability map created based on the prediction model, Liberec and environs, visualization by buildings, natural breaks classification.

Figure 5 :
Figure 5: Average fire probability within municipalities with a delegated municipal office, nature breaks classification.

Table 2 :
Model parameters and measures of their stability.Model parameters in treatment parameterization are contained in the FULL column.Values stated for individual levels of qualitative predictors state the difference from the basic level included in the Intercept row.Asterisks next to individual values in the FULL column indicate significance level in the test of the non-zero parameter or, for qualitative predictors, the non-zero difference from the basic level (*, ** and *** mean p < 0.05, p < 0.01 and p < 0.001, respectively).Other columns summarize parameter values within 100 iterations performed during internal validation.The QRNGdiv2 column expresses half of the inter-quartile spread as a robust alternative to standard deviation; CVar expresses in percentage the ratio of QRNGvar2 to absolute median value, i.e. it is a robust variant of the variation coefficient and demonstrates the degree to which the model parameters fluctuate around the median expressed as a percentage of the median.