ESTIMATION OF SUGARCANE YIELD USING MULTI-TEMPORAL SENTINEL 2 SATELLITE IMAGERY AND RANDOM FOREST REGRESSION
Keywords: Random forest regression, Recursive feature elimination, Sugarcane yield estimation, Sentinel 2
Abstract. Advancements in remote sensing techniques have greatly enhanced crop monitoring and yield estimation, with spectral vegetation indices (VIs) serving as a key component. Our study investigates the use of Sentinel-2 data, notable for its red-edge bands, in estimating sugarcane yield in Ethiopia's Awash Basin. Utilizing 22 VIs from S2 imagery, our approach combines Random Forest (RF) regression with the Recursive Feature Elimination (RFE) algorithm to improve the accuracy of sugarcane yield predictions. The results demonstrate the superior performance of the RF-RFE method over traditional RF with full datasets and Stepwise Multiple Regression (SMR). Particularly, VIs focusing on the red-edge spectral bands of S2 - such as NDVIre1n, NDVIre2n, NDVIre3n, NDRE1 and NDRE2 - were crucial in enhancing prediction precision. These indices from the red-edge and NIR narrow bands consistently influenced yield estimations in both the Wonji-Shoa and Metehara estates. The study underscores the critical role of the RFE algorithm in optimal variable selection, reinforcing earlier findings that precise variable choice can substantially boost model accuracy. The enhanced performance of the RF model when paired with the RFE algorithm was evident, emphasizing the importance of variable selection in accurate yield predictions. Employing the Out-of-Bag RMSE (OOB_RMSE) error estimate for evaluation, we observed variations in OOB_RMSE performance with different RF parameters, identifying the ntree value of 500 as optimal for the studied regions. The RF-RFE model’s estimations showed lower errors and higher correlation coefficients, proving its efficacy over a full dataset approach, which faced challenges with traditional VIs' saturation. Our findings align with earlier studies, highlighting the efficiency of S2’s red-edge bands in diverse estimation tasks and a shift towards using freely available broadband images like S2, over hyperspectral imagery, due to reduced data redundancy and processing costs. In conclusion, our findings reveal that RF regression, particularly when integrated with the RFE algorithm, is a powerful tool in remote sensing applications. The S2 imagery is optimal VIs, predominantly from the red-edge bands, exhibit significant potential for sugarcane yield estimations. The impressive results of the RF-RFE method, evident in metrics like MAE, MAPE, RMSE, Mean percentage, and R2, advocate its invaluable role in sugarcane yield prediction, highlighting its potential for optimizing irrigation management strategies and broad-spectrum agricultural planning.