Assessment of Machine Learning Models for Predicting Aboveground Biomass in the Indian Subcontinent
Keywords: Aboveground biomass, Machine learning models, Random Forest, Vegetation indices, Biomass distribution
Abstract. Understanding the distribution of aboveground biomass (AGB) is vital for evaluating carbon stocks & ecosystem dynamics, especially in regions with diverse landscapes like Indian subcontinent. This study evaluates three machine learning models—Random Forest (RF), Gradient Tree Boosting (GTB), & Classification and Regression Trees (CART)—for predicting AGB across the subcontinent. Independent variable in these models is AGB, while dependent variables include a range of vegetation & topographic layers: Normalized Difference Vegetation Index, Enhanced Vegetation Index, Leaf Area Index, Fraction of Photosynthetically Active Radiation, land cover, elevation, aspect, slope, & hillshade. These predictors are essential for capturing ecological & topographical characteristics that influence biomass distribution. The models were evaluated using coefficient of determination (R2) & Pearson's correlation coefficient (r) to assess predictive accuracy. RF emerged as most accurate, with an R² value of 0.834 & r value of 0.913, effectively capturing the spatial variability in AGB across subcontinent’s diverse ecosystems, which was then used to predict AGB for 2023. The predictions reveal significant spatial variation in biomass density, reflecting region's diverse ecological zones & land-use patterns. In India, high biomass densities are found in Himalayan foothills, northeastern states, & Western Ghats, while arid regions like Rajasthan & Gujarat have lower values. Pakistan generally exhibits low biomass densities, with higher values near the northern border with India. Nepal & Bhutan show high densities in their forested regions, particularly in the mid-hills, high mountains, & Eastern Himalaya. Bangladesh has moderate to low biomass densities. In Sri Lanka, central highlands & southwestern rainforests have highest biomass densities, while the more arid northern & eastern regions exhibit lower values. This study highlights the importance of using robust machine learning models like RF to accurately capture spatial patterns of biomass distribution, which is crucial for forest management, carbon accounting, & biodiversity conservation in the Indian subcontinent.