Predicting Leishmaniasis Risk in Morocco Using Machine Learning, GIS, and Domain Adaptation : A case study of Beni Mellal-Khenifra Region
Keywords: Leishmaniasis, Machine Learning, Geographic Information System (GIS), Risk Mapping, Domain Adaptation, Spatial Prediction
Abstract. Leishmaniasis remains a persistent global public health challenge, particularly in regions where ecological and socioeconomic conditions favor vector proliferation and disease transmission. In Morocco, the provinces of Beni Mellal and Khenifra are among the most severely affected, necessitating the use of advanced spatial prediction tools to guide effective disease control strategies. This study integrated machine learning techniques and Geographic Information System (GIS) technologies to develop a predictive framework for cutaneous leishmaniasis risk mapping. A spatial database was constructed by combining reported case data from 2011 to 2018 with key environmental and climatic variables including temperature, precipitation, normalized difference vegetation index (NDVI), altitude, slope, and wind speed. Three machine learning algorithms, Support Vector Regression (SVR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost), were evaluated for their predictive performance, while the CORrelation ALignment (CORAL) method was applied as a domain adaptation strategy to address distributional differences between training and target regions. The results demonstrated that XGBoost achieved the highest predictive accuracy (R2 = 0.91, MSE = 0.1229, MAE = 0.2587), followed by SVR (R2 = 0.89, MSE = 0.1434, MAE = 0.2765), and RF (R2 = 0.85, MSE = 0.1925, MAE = 0.3120). Incorporating CORAL significantly improved the model generalizability and stability across ecologically diverse zones. The final risk map identified high-risk clusters in central and northern Beni Mellal and Khenifra, offering actionable insights into spatially targeted interventions. This study presents a scalable GIS-integrated machine learning framework with strong potential for application in other data-scarce high-risk regions. Future research should incorporate real-time data and advanced deep learning techniques to further enhance the predictive power.
