FEATURE FILTERING AND SELECTION FOR DRY MATTER ESTIMATION ON PERENNIAL RYEGRASS: A CASE STUDY OF VEGETATION INDICES
Keywords: Feature Selection, Collinearity, Vegetation Indices, Biomass, Dry Matter, Pasture, Perennial Ryegrass, Machine Learning
Abstract. Vegetation indices (VIs) have been extensively employed as a feature for dry matter (DM) estimation. During the past five decades more than a hundred vegetation indices have been proposed. Inevitably, the selection of the optimal index or subset of indices is not trivial nor obvious. This study, performed on a year-round observation of perennial ryegrass (n = 900), indicates that for this response variable (i.e. kg.DM.ha−1), more than 80% of indices present a high degree of collinearity (correlation > |0.8|.) Additionally, the absence of an established workflow for feature selection and modelling is a handicap when trying to establish meaningful relations between spectral data and biophysical/biochemical features. Within this case study, an unsupervised and supervised filtering process is proposed to an initial dataset of 97 VIs. This research analyses the effects of the proposed filtering and feature selection process to the overall stability of final models. Consequently, this analysis provides a straightforward framework to filter and select VIs. This approach was able to provide a reduced feature set for a robust model and to quantify trade-offs between optimal models (i.e. lowest root mean square error – RMSE = 412.27 kg.DM.ha−1) and tolerable models (with a smaller number of features – 4 VIs and within 10% of the lowest RMSE.)