FOREST FIRE SUSCEPTIBILITY ASSESSMENT WITH MACHINE LEARNING METHODS IN NORTH-EAST TURKIYE

: Forest fires have devastating effects on biodiversity, climate, and humans. Producing detailed and reliable forest fire susceptibility maps is crucial for disaster management. Data-driven machine learning methods can be applied for forest fire susceptibility mapping, and learning data required for this purpose can be obtained from high-resolution satellite imagery along with a fire inventory. In this study, we assessed the performances of Random Forest (RF) and artificial neural network (ANN) classifiers for producing forest fire susceptibility maps of a region in north-east Türkiye covering Trabzon, Gümüşhane, Rize, and Bayburt provinces using freely available Earth observation data and forest inventory provided by the regional directorate. Forest type, EU-DEM v1.1 (25 m), and tree cover density were retrieved from Copernicus Land Monitoring Service. Sentinel-2 images were utilized for calculating spectral indices such as normalized difference vegetation index and modified normalized difference water index to assess surface water and vegetation characteristics. Thus, a total of twelve variables including topographic, anthropogenic, hydrologic, vegetation and land use data were used as input. The RF and ANN illustrated similar prediction performances based on receiver operating characteristics (ROC) area under the curve (AUC) values, which were 0.89 and 0.88, respectively. The RF performed better in terms of overall accuracy and F-1 score. The susceptibility maps with 25 m resolution were also investigated visually. The ANN results predicted higher susceptibility levels and larger areas were found prone to wildfire. Leave-one-out analysis results indicated that elevation was the most influential factor based on the achieved OA.


INTRODUCTION
Forest fires have devastating effects on a variety of environmental, social, and economic resources (Gill et al. 2013) Additionally, forest fires cause air pollution and diminish its quality, destroy forest ecosystems, and lead to global warming (Leuenberger et al. 2018).Therefore, the development of an efficient disaster management system to decrease the severity of wildfires and support mitigation efforts is essential (Adab et al. 2013).Although a variety of methods has been utilized in the literature to assess forest fire susceptibility, and the number of studies has increased, further investigations are still needed since the model accuracy depends on the method applied and data availability/quality.More than 30 methods were used for this purpose between 2001 and 2021 (Chicas and Østergaard Nielsen 2022).The most frequently used data-driven machine learning (ML) models have been the random forest (RF) model (Cao et al. 2017) proposed by Breiman (2001), logistic regression (LR) (Rodrigues and de la Riva 2014), support vector machines (SVM) (Sachdeva et al. 2018), and artificial neural networks (ANN) (Zhang et al. 2021;Kantarcioglu et al., 2023).Comprehensive reviews on wildfire susceptibility and hazard mapping can be found in the literature (Jain et al., 2020;Chicas & Østergaard Nielsen, 2022).
The number and severity of forest fires have increased drastically in Turkiye over the last few years.Akinci and Akinci (2023) assessed the forest fire susceptibility for Manavgat region of Antalya province, Turkiye using the ignition inventory of local forest authority and the XGBoost algorithm, based on boosting decision trees.Another recent work by Kantarcioglu et al. (2023) employed an ANN to produce susceptibility map of Thrace region, Turkiye and compared with the RF.An extensive forest database with tree types, tree ages, forest roads, and ignition locations of 1506 fires over a period of eight years was utilized in the study, and the forest roads was found the most predictive variable.
In this study, two widely used ML classifiers, RF and ANN, were again implemented for producing forest fire susceptibility maps for the north-east side of Türkiye covering Trabzon, Gümüşhane, Rize, and Bayburt provinces.The ignition locations were obtained from Trabzon Regional Directorate of Forestry (TRFD).The inventory includes 368 fire events recorded between the years 2013 and 2022.Thus, this study aims to identify areas prone to forest fires (ignition) by predicting their spatial probability using various input variables in combination with RF and ANN.Based on the literature survey, analysis of regional characteristics, and data availability, a total of 12 features were identified, including elevation, aspect, slope, tree cover density, forest type, settlement proximity, settlement density, water proximity, power line proximity, normalized vegetation difference index (NDVI), modified normalized water difference index (MNDWI), and land use land cover (LULC).The susceptibility maps obtained from both models were inspected visually and validated by using receiver operating characteristics (ROC) area under the curve (AUC) values, overall accuracy (OA), and F-1 score.In addition, feature importance analysis based on leave-one-out approach is presented and discussed.

Study Area and Fire Characteristics
The study site is located in the eastern part of Black Sea region, Turkiye, and covers an area of 18,877 km 2 , Figure 1.The area is rather mountainous, and the northern parts are located at higher altitudes as can be seen from the EU-DEM v1.1 (2023) data,   The fire inventory used here consists of locations and dates of ignition events that occurred between 2013 and 2022 (see Figure 1).Further attributes involve risk level description conducted by the emergency response teams and the size of the burned areas, which can be used for hazard assessments in the future.A total of 368 fires served as the basis for forest fire susceptibility mapping.Half of these fires resulted in burnt areas < 1 hectare.The largest burnt area is approximately 130 hectares.

Methodology
The overall workflow of the study is presented in Figure 4.It consists of the preparation of the input features, selection of training and test samples, application and validation of the ML models (RF or ANN), and the production of susceptibility maps by applying the learned models.Further details are provided below.

Input Datasets and Pre-processing:
A total of six images obtained from Sentinel-2 satellites of ESA were mosaicked and clipped for the site to produce maps of NDVI and MNDWI.All Sentinel-2 images were acquired on the 29 th September 2020.As digital elevation model (DEM), EU-DEM v1.1 (2023) with 25 m resolution was employed, additionally slope and the aspect maps were produced using this dataset.Tree cover density and forest type data were obtained from the Copernicus Land Monitoring Service (CLMS, 2023).In addition, power line data were extracted from World Bank global electricity transmission and distribution lines (World Bank, 2023).Buildings in the study area were extracted from OpenStreetMap (OSM) Project repository (OSM, 2023).Likewise, water surfaces were extracted from the EU-Hydro database provided by EU Copernicus Programme (EU-Hydro, 2023).A brief summary of the datasets is given in Table 1.The data pre-processing was carried out using ArcGIS Pro software by Environmental Systems Research Institute (ESRI).

Input
The datasets were clipped according to the study area boundaries.The vector datasets (settlements, buildings, power lines, river network) were rasterized w.r.t. the grid interval of EU-DEM.Density maps were estimated using the kernel density method.Proximity features were derived according to the Euclidean distance.MNDWI and NDVI features were computed from the Sentinel-2 imagery to assess the vegetation and water areas in the study area using Equations 1 and 2. NDVI = (NIR -Red) / (NIR + Red), (1) MNDWI = (Green -SWIR) / (Green + SWIR). (2) The negative (non-fire) reference samples were randomly selected.Their number was set 50% greater than fire samples for a ratio of 1:1.5, to avoid any serious class imbalance.The samples were randomly split as training (80%) and validation (20%) sets (Figure 5).Existing Python libraries were utilized for the implementation of the RF and ANN, with manual hyperparameter optimization as explained next.

ANN:
An ANN consists of layers of neurons that each implement a linear combination followed by a non-linear activation function (Negnevitsky, 2002).The hyperparameters of the ANN model trained here were set manually, based on the susceptibility map and the AUC value of the result.Keras and Tensorflow (Abadi et al. 2016) libraries were used to implement the ANN method.In ANN, two hidden layers were used.These hidden layers were activated with the "relu" function.The output of these hidden layers was activated with the "sigmoid" function to calculate the predictions.Lastly, for compiling the ANN, the "adam" optimizer parameter was used.

RF:
The RF (Breiman 2001) is a supervised classification method consisting of an ensemble of decision trees.Each tree classifies the input according to a sequence of thresholds on the input features, learned from the training data.
The RF was implemented using Sci-kit Python libraries (Pedregosa et al. 2011).Parameter selection has been performed via manual testing based on the accuracy results.In RF, the maximum depth of the tree is set as sixteen, the minimum number of samples required to be at a leaf node is four and the minimum number of samples at the leaves is twelve.

Validation:
The test data was used for performance assessment via ROC curve, which contrasts recall or sensitivity (Eq. 3) and specificity (Eq.4).F-1 score (Eq.5) and OA value (Eq.6) were also computed.Visual inspection of the predicted maps was carried out by the authors as a further assessment.The importance of different features was analyzed using leaveone-out approach, which is based on removal of one feature from the full set at the learning stage.As a further analysis, the importance of the features was also calculated using where TP is true positive, FN is false negative, TN is true negative, and FP is false positive.

RESULTS AND DISCUSSIONS
In the following, the input feature maps obtained from the preprocessing, the susceptibility maps produced with the RF and the ANN, the validation results, and the feature importance analysis are presented and discussed.

Input Features:
The input features obtained from data preprocessing were categorized as topographic, anthropogenic, LULC, vegetation and hydrological (Table 2).Feature maps are shown in Figure A1 in the Appendix.The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-M-1-2023 39th International Symposium on Remote Sensing of Environment (ISRSE-39) "From Human Needs to SDGs", 24-28 April 2023, Antalya, Türkiye

Model Validation Results:
The AUC values obtained from RF and ANN were 0.89 and 0.88, respectively, Figure 8.The corresponding OA were 0.86 and 0.80, and the F1 scores were 0.81 and 0.75.The RF exhibited better prediction performance based on F-1 score and the OA, which indicates a better fit of the model to the learning data.On the other hand, the ANN found larger areas as fire-prone (see above), which needs to be investigated in detail.

Feature Importance:
Based on the feature importance analysis with leave-one-out approach, elevation was found significant for both models (Figure 9, Tables A1 and A2 in Appendix).However, ANN found forest type as the most important based on the F-1 score.When the results presented here are compared with the study of Kantarcioglu et al. (2023) carried out in Thrace Region, Türkiye, the accuracy was better in the latter one (AUC of ANN = 0.94 and RF = 0.95) possibly due to the use of extensive local database including forest roads, and the use of a larger sample size (1506).In addition, although the coverages of both sites are comparable, the environmental and anthropogenic characteristics of both regions are also different.

CONCLUSIONS AND FUTURE WORK
In this study, forest fire susceptibility was asessed over a region comprising four provinces in north-eastern Turkiye, using two different ML methods, namely RF and the ANN.The forest fire inventory obtained from the regional forest authority was used to identify reference ignition locations (368).The features such as elevation, land-cover, hydrology, forest, power lines were obtained from openly available datasets.The tested methods achieved comparable performances, with AUC values of 0.89 and 0.88, respectively.Yet, the ANN yielded larger areas as fire prone.
Based on leave-one-out analysis, the elevation was found to be the most influential factor.Forest type, aspect and LULC were also found highly significant, which demonstrates that vegetation characteristics, topography and anthropogenic factors need to be considered for producing accurate forest fire susceptibility maps.Yet, the results presented here are preliminary and require further investigations especially w.r.t. the different LULC types.

Figure 2 .
Trabzon and Rize are more vegetated compared to Gümüşhane and Bayburt, according to European Space Agency (ESA) WorldCover (2023) data, Figure 3.The Pontic mountains located in the north of the region differentiate the climate and vegetation characteristics of the provinces.

Figure 1 .
Figure 1.Location map of the study area, and ground truth fire ignition points.

Figure 4 .
Figure 4.The workflow of the study.

Figure 5 .
Figure 5.The distribution of training and test samples.

3. 2
Forest Fire Susceptibility Maps: The maps obtained from both the ANN and the RF methods are shown in Figure6.The RF predicted higher susceptibility values (up to 0.96).Both maps show similar spatial distributions overall.However, there are local differences in different land cover types (water, forest, roads, settlements), as illustrated in Figure7.The map details shown in the figure also indicate that the ANN predicts larger areas as fire-prone.

Figure 6 .Figure 7 .
Figure 6.Forest fire susceptibility maps obtained from the ANN (above) and the RF (below).

Figure 8 .
Figure 8. ROC curves obtained for the RF and for the ANN.

Figure 9 .
Figure 9. Leave-one-out analysis results ordered by OA values.

Figure A1 .
Figure A1.Feature maps used in the modelling.

Table 1 .
Input data characteristics.

Table 2 .
Input features implemented in the study.