MODELLING OF LAND SURFACE TEMPERATURE USING GRAY LEVEL CO-OCCURRENCE MATRIX AND RANDOM FOREST REGRESSION
Keywords: GLCM, Vegetation Indices, Built-up Indices, Exploratory Regression, Surface Area Volume Ratio
Abstract. Modelling of land surface temperature (LST) is conducted to be able to explain the spatial and temporal variations of LST using a set of explanatory variables. LST in a previous study was modelled as a linear function of vegetation cover and built up cover as quantified by the normalized difference vegetation index (NDVI) and the normalized difference built-up index (NDBI), respectively, and other variables, namely, albedo, solar radiation (SR), surface area-volume ratio (SVR), and skyview factor (SVF). SVF requires a digital surface model of sufficient resolution while SVR computation needs 3D volumetric features representing buildings as input. These inputs are typically not readily available. In addition, NDVI and NDBI do not fully describe the spatial variability of vegetation and built-up cover within an LST pixel. In this study, PlanetScope images (3m resolution) were processed to provide soil-adjusted vegetation index (SAVI) and VgNIR Built-up Index (VgNIR-BI) layers. The following gray level co-occurrence matrices (GLCM) were generated from SAVI and VgNIR-BI: Mean, Variance, Homogeneity, Contrast, Dissimilarity, Entropy, Second Moment, and Correlation. Random Forest regression was run for several cases with different combinations of GLCM features and non-GLCM variables. Using GLCM features alone yielded less satisfactory models. However, the use of additional GLCM features in combination with other variables resulted in lower MSE and a slight increase in R2. Considering NDBI, NDVI, SAVI_GLCM_contrast, VgNIR-BI_GLCM_contrast, VgNIR-BI_GLCM_dissimilarity, and SAVI_GLCM_contrast only, the RF model yielded an MSE=1.657 and validation R2=0.822. While this 6-variable model’s performance is slightly less, the need for DSM and 3D building models which are necessary for the generation of SVF and SVR layers is eliminated. Exploratory regression (ER) was also conducted. The best 6-variable ER model (Adj. R2=0.79) consists of SVR, NDBI, NDVI, SAVI_GLCM_second_moment, VgNIR-BI_GLCM_mean, and VgNIR-BI_GLCM_entropy. In comparison, OLS regression using the 6 non-GLCM variables yielded an Adj. R2=0.691. The results of RFR and ER both indicate the value of GLCM features in providing valuable information to the models of LST. LST is best described through a combination of GLCM features describing relatively homogenous areas (i.e., dominant land cover or low-frequency areas) and the more heterogenous areas (i.e., edges or high-frequency areas) and non-GLCM variables.