Flood risk mapping and performance efficiency evaluation of machine learning algorithms: Best practice in northern Iran
Keywords: Flood, Machine Learning, GIS, Risk Mapping, Performance Efficiency
Abstract. Flooding is one of the most devastating natural hazards, and inadequate management can amplify its impacts, leading to severe social, economic, and environmental consequences. Accurate and efficient flood risk mapping is essential for mitigating these effects and supporting effective disaster management strategies. However, challenges remain in optimizing the accuracy and reliability of machine learning (ML) algorithms for flood susceptibility assessment. In this study, we applied several ML algorithms, including Random Forest (RF), XGBoost (Extreme Gradient Boosting), LightGBM, CatBoost, and Support Vector Machine (SVM), to develop flood risk maps for a region in northern Iran. For the analysis, we selected a comprehensive set of environmental and geographical parameters influencing flood susceptibility. These included the Digital Elevation Model (DEM), slope, aspect, Topographic Wetness Index (TWI), Stream Power Index (SPI), river distance, river density, rainfall, lithology, Normalized Difference Vegetation Index (NDVI), Normalized Difference Moisture Index (NDMI), soil texture, and land use. Data processing, feature extraction, and model training were conducted using Python, Google Earth Engine, and ArcGIS. Our results demonstrate a strong level of consistency across the models. XGBoost achieved the highest Area Under the Curve (AUC) of 0.87, closely followed by CatBoost at 0.86, Random Forest (RF), and LightGBM, each reaching 0.85. SVM recorded a slightly lower AUC of 0.82. These findings underscore the robust performance of advanced ML algorithms, particularly ensemble methods with tree-based structures, in flood risk mapping, especially within complex environmental contexts.