Identifying and predicting climate change impact on vector-borne disease using machine learning: Case study of Plasmodium falciparum from Africa

Vector-borne diseases pose a significant threat to human health, particularly in regions vulnerable to climate change. Among these diseases, malaria, caused by the parasite Plasmodium falciparum and transmitted through the Anopheles mosquito, remains a major global health concern, particularly in sub-Saharan Africa. This study explores the use of machine learning techniques to identify and predict the impact of climate change on the transmission dynamics of P. falciparum malaria in Africa. The research utilizes a combination of climate data, epidemiological records, and machine learning algorithms to analyze historical patterns and project future trends in malaria transmission. Key climate variables such as temperature, precipitation, humidity, and vegetation cover are integrated into predictive models to assess their influence on the abundance and distribution of mosquito vectors and the parasite's lifecycle. Through the application of machine learning models such as Maximum Entropy, this study aims to uncover complex relationships between climatic factors and malaria transmission dynamics. By training these models on historical data, they can accurately predict future scenarios under various climate change scenarios. The findings of this research will provide valuable insights into the potential impact of climate change on the spatial and temporal distribution of P. falciparum malaria in Africa. Such insights are crucial for designing targeted interventions and adaptation strategies to mitigate the anticipated rise in malaria cases and associated morbidity and mortality in the region. Moreover, the methodology developed in this study can serve as a framework for assessing and addressing the impact of climate change on other vector-borne diseases globally.


Introduction
Vector-borne diseases, such as malaria, dengue fever, Zika virus, and Lyme disease, pose significant public health challenges worldwide, particularly in regions where environmental conditions are favorable for the proliferation of disease vectors.Among these diseases, malaria remains a major global health concern, with the majority of cases occurring in sub-Saharan Africa.Climate change is increasingly recognized as a key driver influencing the distribution, abundance, and transmission dynamics of vector-borne diseases.In recent years, there has been growing interest in utilizing machine learning techniques to better understand and predict the impact of climate change on these diseases, particularly focusing on malaria caused by the parasite Plasmodium falciparum.Studies examining the relationship between climate change and vectorborne diseases have a rich history dating back several decades.Early research primarily focused on statistical modeling approaches to assess the association between climatic variables and disease incidence.These studies laid the groundwork for understanding the complex interactions between climate, vectors, hosts, and pathogens in disease transmission cycles.However, traditional statistical methods often have limitations in capturing nonlinear relationships and complex interactions within large and heterogeneous datasets.Seasonal variations in vector abundance have a significant impact on the seasonal dynamics and geographic distributions of vector borne parasites abundance (Chavasse et al., 1999;Emerson, Bailey, Mahdi, Walraven, & Lindsay, 2000;Mabaso, Craig, Vounatsou, & Smith, 2005).For example, malaria is one the vector-borne diseases that often shows seasonal abundance to annual rainfall and temperature in various regions, such as Kenyan Highlands (Hay et al. 2002).These diseases have become increasingly common in recent years in many parts of the world, including Saudi Arabia (dengue) (Alkhaldy and Barnett, 2021), Senegal (chikungunya) (Dieng et al., 2022), Brazil (high incidence of dengue, Zika, and chikungunya (Lisboa et al., 2022), and China (dengue fever (FD) outbreaks in epidemic areas (Zhang et al., 2022).Generally, their survival depends upon food, water and shelter at certain geographic locations, but now they are surpassing their geographical barriers and becoming endemic in other regions of the globe.For example, the Aedes aegypti species is the main vector known for the predominance of malaria in the Americas (WHO, 2023) and now it is prevalent in entire world.The five protozoa that cause malaria are Plasmodium falciparum, P. vivax, P. malariae, P. ovale, and most recently P. knowlesi are spread by mosquitoes.More than 90% of malaria-related deaths worldwide are attributed to P. falciparum infection, which means that the disease still poses a serious threat to public health on a global scale (Snow, 2015).According to the World Health Organization's (WHO) 2019 World Malaria Report, there were 228 million cases of malaria worldwide in 2018, which resulted in 405 000 deaths, many of them children under the age of 5. Approximately 40% of the world's population is affected by malaria, which is endemic in more than 90 nations (Garcia, 2010).Therefore, when it comes to vector-borne disease epidemiology, temporal variation along with its spatial distribution is most concentrated aspect in infectious disease transmission.As evident from various studies on vector-borne diseases, the association of environmental variables were explored, and this topic is gaining recognition in spread of diseases in rural and diverse urban environments (Abd Majid et al., 2019;Almeida et al., 2020;Cordeiro et al., 2011;Khalid & Ghaffar, 2015;Maftei et al., 2021;Rose et al., 2020;Tunali et al., 2021).The knowledge about the variables which can influence the spreading of the mosquito throughout the world is of utmost importance, as it can be used in the proposition of preventive models (Sriklin et al., 2021).Therefore, it is necessary to know about the predictor variables that can imply the existence of P. falciparum, so that it becomes possible to apply more assertive measures for the sake of epidemiologic vigilance, prevention of diseases and health promotion.It is crucial to be aware of the environmental factors that can affect how widely a mosquito breeds since this information can be utilised to develop preventive models.Recently, various machine learning (ML) algorithms such as maximum entropy modelling (Maxent), random forests, and artificial neural networks have all been used successfully to assess the risk of invasive species (Abdulkareem et al., 2021;Hu et al., 2020;Kaur et al., 2022;Mbunge et al., 2022;Nkiruka et al.,2021;Peters et al., 2020).The knowledge of predictor variables that can suggest the existence of P. falciparum.In this study, the environmental conditions for the occurrence of P. falciparum in Benin are evaluated and its potential distribution are predicted for risk assessment through commonly used machine learning algorithm, i.e. maximum entropy (maxent), to help in establishing the surveillance and preventive programs.Therefore, this study aims to propose methodology for the purposes of epidemiologic vigilance, preventing vector-borne diseases, and promoting public health.In recent years, the emergence of machine learning techniques has provided new opportunities to overcome the limitations of traditional statistical approaches in modeling the impact of climate change on vector-borne diseases.Machine learning algorithms, such as Random Forest, Support Vector Machines, Gradient Boosting, and Neural Networks, offer powerful tools for analyzing large and multidimensional datasets, identifying complex patterns, and making accurate predictions.Several studies have applied machine learning techniques to assess the impact of climate change on malaria transmission dynamics, with a particular focus on P. falciparum in Africa.These studies have utilized diverse datasets, including climate data from remote sensing satellites, epidemiological records, land cover data, and entomological surveys.By integrating these datasets and applying machine learning algorithms, researchers have been able to identify key environmental factors influencing mosquito abundance, parasite development, and disease transmission.While machine learning holds promise for improving our understanding of the impact of climate change on vector-borne diseases, several challenges remain.These include the need for high-quality and spatially explicit data, addressing issues of data bias and uncertainty, and ensuring the interpretability and generalizability of machine learning models.Additionally, there is a need for interdisciplinary collaboration between climatologists, epidemiologists, entomologists, and data scientists to effectively leverage machine learning techniques for disease modeling and prediction.

Research Impact
The discussion surrounding the identification and prediction of climate change impacts on vector-borne diseases, focusing on Plasmodium falciparum malaria in Africa and employing machine learning techniques, encompasses several key points: Methodological Advances and Contributions: The utilization of machine learning techniques represents a significant advancement in the field of disease ecology and epidemiology.By leveraging sophisticated algorithms, researchers can analyze large and complex datasets to uncover intricate relationships between climatic variables and disease transmission dynamics.The case study focusing on P. falciparum malaria in Africa serves as an illustrative example of how machine learning can be applied to address pressing public health challenges in regions highly susceptible to climate change.

Insights into Climate-Vector-Pathogen Interactions:
Through the integration of climate data, epidemiological records, and entomological surveys, machine learning models can provide valuable insights into the intricate interactions between climate, vectors (e.g., Anopheles mosquitoes), and pathogens (e.g., Plasmodium parasites).These models can identify key environmental factors influencing vector abundance, parasite development, and disease transmission, thereby elucidating the underlying mechanisms driving malaria dynamics in Africa.
Predictive Capabilities and Future Projections: One of the primary strengths of machine learning approaches is their ability to make accurate predictions based on historical data.By training models on past climate and disease data, researchers can project future scenarios under various climate change scenarios.These projections are essential for anticipating the potential impact of climate change on malaria transmission dynamics, guiding the development of targeted intervention strategies, and informing public health policies in endemic regions.
Challenges and Limitations: Despite their promise, machine learning techniques also pose certain challenges and limitations.These include the need for high-quality and spatially explicit data, issues related to data bias and uncertainty, and concerns regarding the interpretability and generalizability of models.Addressing these challenges requires interdisciplinary collaboration between climatologists, epidemiologists, entomologists, and data scientists, as well as ongoing efforts to improve data collection, validation, and model validation processes.

Implications for Public Health and Adaptation Strategies:
The findings derived from machine learning models have significant implications for public health and adaptation strategies in malaria-endemic regions of Africa.By identifying areas at high risk of malaria transmission under different climate change scenarios, policymakers can prioritize resource allocation, implement targeted vector control measures, and strengthen healthcare infrastructure to mitigate the anticipated rise in malaria cases and associated morbidity and mortality.Additionally, machine learning can facilitate the development of early warning systems for malaria outbreaks, enabling proactive responses to emerging threats.

Methodology
This study is carried in four succeeding steps: 1) identification of endemic site and data collection on occurrences of P. falciparum at the site, 2) evaluation of the environmental elements that influence the incidence of P. falciparum, 3) construction of predictive model using ML, and 4) analysis of contribution of each environmental factors to P. falciparum occurrence in model.

Data Collection and Preprocessing
The targeted area is Benin country in West Africa where P. falciparum is the deadliest malaria parasite posing the greatest threat to African continent.The current environmental data of African continent are obtained from the Worldclim (http://worldclim.org)project, which has gathered high resolution worldwide climate coverages.For this study, the environmental data of bioclimatic variables at spatial resolution of 2.5 arc seconds.The next step is to obtain the occurrence of P. falciparum from Global Biodiversity Information Facility (GBIF) repository (GBIF.org,2023).The GBIF is a global repository that holds compiled records of millions of species worldwide.The occurrences are then pre-processed to remove duplicate points and NA values.

Model Construction
The occurrence and environmental predictors are ready for model construction and predicting the potential range of P. falciparum.Before this, it is required to discover how well the model can perform.Therefore, some of the occurrences are reserved for testing of model as test data and remaining are considered as train data used in model training.In this study, 20% of the observations will be randomly withheld as test data while the remaining 80% would be kept as training data.

Model Assessment
The Area Under the Curve (AUC) or sensitivity vs.

Conclusions
In conclusion, the application of machine learning techniques in identifying and predicting the impact of climate change on vector-borne diseases, such as malaria caused by P. falciparum in Africa, represents a promising avenue for future research.By integrating diverse datasets and employing advanced modeling approaches, researchers can gain valuable insights into the complex interactions between climate, vectors, and pathogens, thereby informing evidence-based strategies for disease control and adaptation in the face of changing environmental conditions.These modelling approach suggested to understand the widespread prevalence of malaria parasite, and this could be used to have preventive models.Through the study of variable importance and partial dependence plots, it is discovered that the most important climatic parameters, that influenced the prevalence of malaria parasite, P. falciparum.The model's predicted risk map for malaria parasite incidence closely tracked actual field data, demonstrating its strong capacity for prediction and management.The model also demonstrated the expanded incidence of parasite in African continent under the same climatic space.Therefore, when developing management and control strategies for malaria, this information could be considered.
Despite the challenges and limitations associated with these approaches, continued research efforts and interdisciplinary collaboration are essential for harnessing the full benefits of machine learning in the fight against vector-borne diseases in the context of a changing climate.
1specificity graph generated by Maxent model explains the precision and fit of the predicted model.If AUC values are close to 1.0, this indicates the improved model performance, whereas 0.5 suggested no better performance than random model.However, if the species has a wide range of distribution, AUC values may be lower because of the greater commission.The AUC value of this Maxent model is 0.871 signifying that this model is better than the random, shown in Figure 1.

Figure 2 .
Figure 2. Variables contribution in modelling