Comparison of machine learning and statistical approaches for digital elevation model (DEM) correction: interim results

Several methods have been proposed for correcting the elevation bias in digital elevation models (DEMs) for example, linear regression. Nowadays, supervised machine learning enables the modelling of complex relationships between variables, and has been deployed by researchers in a variety of fields. In the existing literature, several studies have adopted either machine learning or statistical approaches in the task of DEM correction. However, to our knowledge, none of these studies have compared the performance of both approaches, especially with regard to open-access global DEMs. Our previous work has already shown the potential of machine learning approaches, specifically gradient boosted decision trees (GBDTs) for DEM correction. In this study, we share some results from the comparison of three recent implementations of gradient boosted decision trees (XGBoost, LightGBM and CatBoost), versus multiple linear regression (MLR) for enhancing the vertical accuracy of 30 m Copernicus and AW3D global DEMs in Cape Town, South Africa.


INTRODUCTION
Several methods have been proposed for correcting the elevation bias in digital elevation models (DEMs) for example, linear regression (e.g.Preety et al., 2022).Nowadays, supervised machine learning enables the modelling of complex relationships between variables, and has been deployed by researchers in a variety of fields.In the existing literature, several studies have adopted either machine learning or statistical approaches in the task of DEM correction.However, to our knowledge, none of these studies have compared the performance of both approaches, especially with regard to open-access global DEMs.Our previous work has already shown the potential of machine learning approaches, specifically gradient boosted decision trees (GBDTs) for DEM correction, e.g.(Okolie et al. 2023).In this study, we share some results from the comparison of three recent implementations of gradient boosted decision trees (XGBoost, LightGBM and CatBoost), versus multiple linear regression (MLR) for enhancing the vertical accuracy of 30 m Copernicus and AW3D global DEMs in Cape Town, South Africa.

METHODOLOGY
The training/input datasets are comprised of eleven predictor variables including elevation, slope, aspect, surface roughness, topographic position index, terrain ruggedness index, terrain surface texture, vector ruggedness measure, percentage bare ground, urban footprints and percentage forest cover.The target variable (elevation error) was derived with respect to highly accurate airborne LiDAR.Since multicollinearity is not a major concern for decision trees, all the input variables were fed into the gradient boosted decision trees (GBDTs) where training was done using Python scripting in the Google Collaboratory environment.Generally, the models (trained with default hyperparameters) performed considerably well and demonstrated excellent predictive capability.In the case of MLR, surface roughness and TRI were flagged during multi-collinearity (Person's correlation and Variance Inflation Factor) diagnostics and excluded from the input variables.Thus using MLR, the elevation error was expressed as a linear combination of nine input variables.The MLR was implemented within R, using the syntax for the lm() function.Both models (GBDTs and MLR) were evaluated at several implementation sites for prediction and correction of DEM error.The corrections were achieved by subtracting the predicted elevation errors from the original elevations (i.e., DEM Corrected = DEM Original − ∆h).

RESULTS AND DISCUSSION
Numerous terrain offsets degraded the accuracy of the original DEMs.In several instances after correction, the terrain offsets in the original DEMs were de-escalated (e.g.Figures 1 and 2).Table 1 compares the percentage reduction in RMSE of AW3D and Copernicus DEMs after correction.In the urban/industrial and grassland/shrubland landscapes, there was a greater than 70% reduction in the RMSE of the original AW3D DEM, after correction.Similarly, the RMSEs reduced in other landscapes: agricultural (>45%), peninsula (>50%) and mountainous (>13%).The corrections improved the accuracy of Copernicus DEM, e.g., > 44% RMSE reduction in the urban area and >32% RMSE reduction in the grassland/shrubland landscape.The statistical-based (MLR) and machine learning (GBDT) correction achieved significant corrections of AW3D and Copernicus DEMs.While MLR outperformed the GBDTs in one scenario (i.e.Copernicus DEM in the grassland/shrubland landscape), the GBDTs outperformed MLR in most landscapes.

CONCLUSION
The comparison proves the robustness of the GBDT-based correction in virtually all the landscapes under consideration.Future studies could integrate other approaches in the comparison.

Figure 1 .Figure 2 .
Figure 1.Absolute height error comparison of corrected DEMs in urban landscape