Estimating Forest Canopy Height based on GEDI Lidar Data and Multi-source Remote Sensing Images

Estimating forest canopy height is crucial for assessing aboveground biomass and carbon sequestration. Light detection and ranging (lidar) is an important technology for its ability in capturing vertical structural information. However, due to instrument limitations and cost constraints, acquiring large-scale and continuous forest data solely through lidar is challenging. To compensate this, remote sensing images can be used to cover wide regions. Therefore, leveraging multi-source data for constructing canopy height models (CHMs) holds great promise in this field. The objective of this study is to evaluate and compare the contributions of multi-source remote sensing data and methods in estimating forest canopy height. In constructing the CHM, the commonly used random forest (RF) and fully convolutional network (FCN) are assessed. The canopy height obtained from GEDI was used as the reference data, and Landsat 8 and Sentinel-2 data were used for prediction. Multiple CHMs were constructed for the Dabie Mountains, Central China, in 2019 based on different data sources and methods, respectively, which are then comparatively analysed. The results showed that (1) the accuracy of the CHM using Sentinel-2 as input is marginally better than that using Landsat 8 based on RF, where the difference is insignificant; and (2) FCN is less accurate than RF despite domain-specific fine-tuning, although further improvement in accuracy is expected by weighing in more FCN models.


Introduction
Forest canopy height is a representative parameter of forest structure, particularly in its ability to reflect aboveground biomass, which holds significant implications for carbon sequestration estimation (Tolan et al., 2024).Light detection and ranging (lidar), as an active remote sensing technology, possesses the capability to acquire the vertical structure of trees and obtain tree canopy height (Lefsky et al., 2002).However, due to cost constraints, airborne lidar data is not suitable for large-scale and repetitive forest observations (Coops et al., 2021).On the other hand, spaceborne lidar can overcome this limitation.Spaceborne lidar can obtain the vertical structure of forests, including canopy height, but its footprint is discrete and has low spatial density, so continuous coverage of canopy height cannot be directly obtained.Integrating spaceborne lidar footprints with continuous optical remote sensing images is a significant approach to estimating forest canopy height with large-scale continuous coverage (Wang et al., 2023).
Remote sensing images are becoming increasingly abundant, offering a wealth of data sources that can be harnessed to enhance canopy height models (CHMs) through the utilization of multi-source data.However, it is important to note that the mere incorporation of multiple data sources does not always result in a significant improvement in model accuracy.In fact, it may lead to an increased reliance on the data, potentially limiting the applicability of the model (Fayad et al., 2024).To address this issue, the selection of the most appropriate data source for a specific study becomes crucial.Researchers must carefully consider various factors, such as the spatial resolution, spectral characteristics, and temporal coverage of the data sources, to ensure the compatibility with their objectives.This selection process requires a thorough understanding of the strengths and limitations of different remote sensing platforms and sensors.
In parallel, advancements in deep learning techniques have facilitated the integration of neural networks into canopy height estimation, resulting in superior performance compared to traditional regression analysis and conventional machine learning methods (Fayad et al., 2024;Illarionova et al., 2022).These deep learning models have demonstrated exceptional capabilities in capturing complex relationships and patterns within the data, thereby leading to more accurate and robust CHMs.However, despite these advancements, there remains a lack of consensus regarding the optimal data sources and model architectures for canopy height estimation.Many studies have focused on specific data sources or models, disregarding the potential benefits of integrating multiple data sources or exploring alternative model architectures.For instance, Wang et al. (2023) primarily investigated the impact of multimodal spaceborne lidar data on CHMs without analyzing the potential contribution of multi-source optical remote sensing images.Similarly, Gupta and Sharma (2022) compared various machine learning methods without considering the latest advancements in neural networks.Consequently, there is a clear need for further research to improve the accuracy of canopy height estimation by carefully evaluating and selecting appropriate data sources and exploring innovative model architectures.This can involve investigating the synergistic benefits of combining different remote sensing modalities, such as optical imagery, lidar data, and synthetic aperture radar (SAR) data.Additionally, exploring novel deep learning architectures, such as convolutional neural networks (CNNs) or recurrent neural networks (RNNs), specifically tailored for CHM tasks, holds promise for achieving more accurate and reliable results.
In this end, the main objective of this study is to assess and compare the contributions of multi-source remote sensing data and methods in estimating canopy height.We conducted our study in the Dabie Mountains, in central China.The random forest (RF) algorithm and a model based on the fully convolutional neural networks (FCN) architecture (Lang et al., 2023) are comparatively analysed, using both Landsat and Sentinel data.We conducted our study in the Dabie Mountains, which are located at the junction of Anhui, Hubei, and Henan provinces, China.The Dabie Mountains are about 380 kilometers east and west and 175 kilometers north and south.It is a special distribution area of rare and endangered wild animals and plants in China, and is a national nature reserve.However, there is a lack of research on long time-series large-scale canopy height mapping.

Study site
The forest range and GEDI (Global Ecosystem Dynamics Investigation) footprint distributions in the Dabie Mountains are shown in Figure 1.The forest area of the Dabie Mountains was derived after applying a forest type coverage mask to the data.Each footprint of the GEDI data has a width of 25 m, with a spacing of 60 m between consecutive footprints.Additionally, there are 8 parallel tracks simultaneously sampling the area (Jucker et al., 2023).

Remote sensing images Google Earth Engine (GEE)
dataset provides rich historical images.This study obtained Landsat 8 and Sentinel-2 images for the entire year of 2019 based on the GEE platform, and completed cloud removal and fusion to obtain two images of the Dabie Mountains in 2019, respectively.

Auxiliary data
In order to remove non forest areas from the study area, we used the 2020 global 30m land-cover map publicly released by Zhang et al. (2021).

Overall workflow
In this study, we employed Landsat 8 and Sentinel-2 images as training data for feature extraction and prediction.GEDI data was used as reference data to extract canopy height information.
In order to compare the performance of different data sources on the same model, Landsat 8 and Sentinel-2 images were used to construct CHMs based on RF, respectively.In order to compare the performance of different models on the same data source, we leverage the FCN model generated by Lang et al. (2023) to conduct a canopy height prediction focused on the Dabie Mountains region, which was compared with our RF model.During the construction of the FCN model by Lang et al. (2023), five separate models were trained with different weights.Then the resulting five models are then strategically fused to obtain an optimal outcome.Furthermore, we finetuned the FCN model by retraining with Sentinel-2 data and GEDI data from the Dabie Mountains region.
Upon acquiring the CHMs, we conducted precision evaluations and comparative analyses of the final predictions using GEDI data as validation datasets.
To ease reading, the CHMs based on RF with Landsat 8 or Sentinel-2 as input are referred to as L8RF or S2RF, respectively.The CHM based on the FCN model with Sentinel-2 as input was referred to as S2FCN-5 and the CHM based on the FCN model which has undergone retraining is referred to as S2FCN-1. 1 and 5 represent its model trained with one or five model weights.The framework of the research methodology is shown in Figure 2.

Random forest
RF is a powerful machine learning algorithm widely applied in data mining and predictive modeling.It is an ensemble learning method that predicts outcomes by constructing multiple decision trees and combining their outputs (Breiman, 2001).
In this study, we chose RF as our predictive model to estimate forest canopy height.We integrated a variety of feature variables, including nine vegetation indices and six feature components in addition to the original bands of the remotely sensed image.By inputting these features into the RF model, we were able to generate comprehensive and extensive CHMs by regression prediction using canopy height data provided by GEDI and optical image data.

Fully convolutional neural network
FCN is a deep learning model based on convolutional neural networks (CNNs) that lacks fully connected layers and consists solely of convolutional layers (Long et al., 2015).FCN is primarily employed for image segmentation tasks, aiming to generate pixel-level predictions for images.In semantic segmentation, FCN can predict the class membership of each pixel, thus achieving the objective of partitioning the image into distinct regions.The key advantage of FCN is its ability to handle input images of arbitrary sizes, as convolutional operations can be performed on inputs of any dimensions.Furthermore, FCN can produce output maps with the same dimensions as the input image.
In this study, FCN is applied to extract tree canopy height information from Sentinel-2 optical satellite images.The model combines sparse height data from the GEDI spaceborne lidar mission with Sentinel-2 satellite imagery to map the canopy height.

Experiment
The main steps in constructing CHM in this study include data preprocessing, model construction, accuracy assessment and comparison.

Preprocessing
The preprocessing of GEDI mainly involves data screening and conversion into rasters, while the preprocessing of remote sensing image data focuses on cloud removal, fusion and feature extraction.

GEDI data
In order to obtain high-precision canopy height, we filtered all GEDI data obtained in 2019.Mainly utilizing the attributes of L2A level products (Liu et al., 2022;Potapov et al., 2021), the following conditions were set: 1. Collected at night.
2. In power beam mode.3. Sensitivity not less than 0.9. 4. Quality flag equal to 1. 5. Degrade flag equal to 0. 6.The ground elevation of the GEDI footprint location differs from the STRM elevation by less than 50 meters.
Only footprint points that meet all quality requirements will be retained.Based on the 2020 land classification data released by Zhang et al. (2021), remove the GEDI footprints that fall outside the forest.We use the 95% energy return height relative to the ground (RH95) of the suggested result for each laser footprint as the canopy height obtained by GEDI (Potapov et al., 2021).Salomonson and Appel ( 2004)  The canopy height data obtained from GEDI is predominantly clustered around the median value, with limited representation in the lower and higher ranges.This imbalance dataset could potentially bias the model towards capturing the characteristic features of the majority class, neglecting those of the minority class, consequently impacting the overall predictive performance.To alleviate this problem, we resampled the training set of the RF.The canopy height was divided into 20 categories at intervals of 2 meters.Classes with few samples in each interval were oversampled by cloning samples, while classes with numerous samples were undersampled by randomly removing samples.After balancing the dataset, the training set had approximately 4000 samples in each canopy height interval.

Train
This study used scikit-learn to construct a RF regression model.Using the canopy height obtained from GEDI as the validation value, the optimal model is obtained by adjusting the number of decision trees, maximum tree growth depth, minimum sample size of leaves, and minimum sample size of branch nodes through triple cross validation.After optimizing the hyperparameters, the number of decision trees in S2RF is 280, the maximum growth depth of the tree is 470, the minimum number of samples for leaves is 2, and the minimum number of samples for branch nodes is 1.The number of decision trees in L8RF is 270, with a maximum growth depth of 300, the minimum number of samples for leaves is 2, and the minimum number of samples for branch nodes is 1.

Dataset
The FCN model studied by Lang et al. (2023) was trained using global Sentinel-2 images in 2020, with a different scope and time compared to this study.Therefore, we used GEDI and Sentinel-2 data to create the 2019 Dabie Mountains dataset and retrained the model.
Unlike the RF dataset, this dataset not only includes 12 bands and canopy heights of Sentinel-2, but also sample weights, SCL, cloud (CLD), latitude and longitude.CLD is used for cloud masks, and the larger the value, the greater the degree to which the pixel is covered by clouds.We performed cloud removal on Sentinel-2 in GEE, so in this study, all CLDs were assigned 0 values.
We centered each pixel with a canopy height and cropped the surrounding 15 × 15 sized images to obtain pixel values of 12 bands and other attributes.Then, we used a softened version of inverse sample-frequency weighting to re-weight each sample (Lang et al., 2023).The canopy height was divided into Kbins at intervals of 1 meter, and the number of samples N k in each K bin is calculated.Formula 1 was used to calculate the weight of samples in each where, qi is the weight of the samples in K bin which k = i, K is the total number of K bins.
We save the above attributes for each sample, divide them into training and validation sets in an 8:2 ratio, and organize them into H5 files, consistent with the research format of Lang et al. (2023) and others.This way, it can be read in without making any changes to the input part.

Train
This model is based on the FCN architecture in Lang et al. (2019).But in order to accelerate the deployment speed of the model, its size was reduced by setting the number of blocks to 8 and the number of filters for each block to 256.
The input of this model is 12 bands of Sentinel-2 and cyclic encoded geographic coordinates, with a total of 15 channels.Its output is the height predicted by the model and its variance, which have the same spatial dimension as the input.We used the Dabie Mountains dataset for model training using sparse supervision as in previous studies.Before the input data was passed into the convolutional layer, each channel was normalized to standard normal using the statistics of the training set.The canopy height used for calibration was normalized in the same way.The neural network was trained using the Adam optimiser over 51,000 iterations with a batch size of 1600.The base learning rate for model training was 0.0001, which decreased by a factor of 0.1 after 10,200 and 20,400 iterations, respectively.The model accuracy changes during the training are shown in Figure 3.

Accuracy evaluation
We used different metrics to assess the accuracy of the CHM, including Root Mean Square Error (RMSE), Mean Absolute Error (MAE).
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-1-2024 ISPRS TC I Mid-term Symposium "Intelligent Sensing and Remote Sensing Application", 13-17 May 2024, Changsha, China where N represents the number of samples, represents the ground truth values, and f (xi) represents the predicted values.

Results
The estimated forest canopy height maps are shown in Figure 4.The results show that the L8RF predicts a maximum canopy height of 39.5 m and that the forests located in the west and dispersed have lower canopy heights than the dense forests in the east.The S2RF predicts higher canopy heights in the central and eastern forests, with a maximum of 32.9 m.The S2FCN-1 predicts higher canopy heights north and south of the centre, with a maximum of 38.0 m.The S2FCN-5 predicts the highest 35.6 m canopy was located in the western centre.There was some variation in the predictions of high canopy among the four models.Consistently, each model predicted lower canopy heights in the eastern and western dispersed forests.The results of the accuracy assessment of L8RF, S2RF, S2FCN-1 and S2FCN-5 using the canopy height obtained by GEDI as the observation are shown in Table 2, and the scatter plots are shown in Figure 5.With more bands and higher resolution than Landsat 8, Sentinel-2 might have been thought to be superior in canopy height prediction.However, this was not the case in the experimental results, and its characteristics did not lead to exceptionally good results.It probably because the GEDI resolution is closer to the former and RF is a pixel-based method, so the higher match between GEDI and Landsat 8 pixels aids in accurate estimation.Therefore, the close accuracy of the final L8RF and S2RF may be the result of the interaction of several factors such as the number of bands and resolution.

Metrics
Although the accuracy of L8RF and S2RF is not much different, the canopy height estimated by S2RF has a higher resolution.This will be an advantage of Sentinel-2 in CHM.S2RF has better RMSE and MAE.
S2RF is based on classical machine learning, which is pixelbased regression.It uses input features to construct the model.S2FCN-1 is based on deep learning and learns new features autonomously.The better accuracy of S2RF is most likely because we input features for RF that are more relevant to canopy height, while FCN does not learn those important features.This proves that the RF with reasonable features input is more accurate in canopy height estimation than a single FCN with no features extracted for the data source.
S2FCN-1 and S2FCN-5 use exactly the same input data for prediction and the overall framework of the model is consistent, but the results are different.The RMSE of S2FCN-5 is lower, possibly due to the use of five FCN models weighted by the estimated aleatoric uncertainties to estimate canopy height in Lang et al. (2023).The result of S2FCN-5 is the fusion of the weighted average values of five models.This processing has been proven to help improve model accuracy.In this study, only one model was trained to estimate canopy height, which may result in some accuracy loss compared to combining multiple models.However, the poor accuracy of S2FCN-5 in this study compared to the original article is due to the fact that the data used to train S2FCN-5 came from the global Sentinel-2 imagery in 2020, whereas the inputs used for forecasting came from the Dabie Mountains in 2019.This proves that it is best to ensure that the data used to train the model and the data used for prediction come from the same source.

Discussion
We compared the impact of various remote sensing data and models on CHM, but there are areas that can be improved and continued to be explored.
In model training, the accuracy of S2FCN-1, on which we generated the dataset and trained it, still requires improvement.Despite utilizing the dataset from the Dabie Mountains for training, the accuracy of S2FCN-1 does not surpass that of S2FCN-5.This can be done by weighing more variants of the FCN, etc.All four models compared in this study had poor estimates for both low and high canopy.These issues require further investigation.
In the comparative analysis, we used the same data source for the construction of S2RF and S2FCN-1, but due to the different inputs required for RF and FCN, the Sentinel-2 data had to be preprocessed differently.In this process, we have done as much as possible to make the operations correspond to each other.For example, FCN assigns weights to samples by canopy height, mitigating errors caused by imbalanced canopy heights in the dataset, so we resampled the imbalanced dataset in the preprocessing of constructing S2RF to reduce adverse effects.Such an approach ensures, as far as possible, that the differences in the final results come from the different models and not from the input data.However, the impact of input on the results cannot be completely eliminated.
In future studies, the effects of other factors on CHM can be explored.For example, more comparisons and analyses can be done by incorporating multi-source remote sensing data and time-series features in CHM.Besides, choosing the best data and model in an experiment should not only focus on the final results, but also consider the purpose and requirements of the study, as well as equipment, time, and other issues.Our work can provide comparison and reference.However, the most appropriate data and model need to be judged by the researcher on a case-by-case basis.

Conclusion
In this study, the contribution of multi-source remote sensing data and methods in estimating forest canopy height was evaluated and compared using the Dabie Mountains as the study site.
In the comparison of remote sensing data, there is little difference in the accuracy of CHM based on RF using Landsat 8 and Sentinel-2, although the use of the latter provides a higher spatial resolution of the predictions.In the comparison of regression models, RF is more accurate than FCN, demonstrating that RF with reasonable feature inputs could be more accurate than a complex FCN without fine-tuning or a simple FCN model.Further improvements in accuracy could be considered by weighing in more variants of the FCN with domain-specific training.

Figure 1 .
Figure 1.Forest range and GEDI footprint distribution in the Dabie Mountains, Central China.

Figure 2 .
Figure 2. Framework of the research methodology.
When constructing an CHM based on RF, we used two datasets consisting of GEDI and different remote sensing data.Taking GEDI and Landsat 8 as examples, we use pyGEDI to convert preprocessed GEDI footprints into a 25m resolution canopy height raster image, then match each canopy height with the nearest Landsat 8 pixel and store them in a new channel, and divide pixels with canopy height into training and validation sets in an 8:2 ratio.The method of creating datasets for GEDI and Sentinel-2 can be extrapolated in this way.

Figure 3 .
Figure 3. Change plot of the model accuracy.
The results show that S2RF has the lowest RSME and MAE of 6.931 m and 5.645 m, respectively.The predicted values of the L8RF and S2FCN-5 correlate relatively well with the observed values.The S2FCN-5 has an RMSE of 6.0 m in the global experiments.5.2 Analysis5.2.1 Compare different dataIn the model based on RF, the inputs of L8RF and S2RF come from different data sources, but their differences in accuracy assessment metrics are less obvious.The RMSE and MAE of S2RF are slightly smaller than that of L8RF.

Figure 4 .
Figure 4. Forest canopy height map of the Dabie Mountains in 2019 using different CHMs.
.1.2Landsat8imagesWeobtained Landsat 8 images with cloud coverage less than 20% in 2019 from USGS Landsat 8 Level 2 Collection 2 at GEE, and selected 6 bands (bands 2 to 7) suitable for classification.Cloud removal procedures were applied to each image, followed by a fusion process utilizing the median value.Subsequently, the calculation of vegetation indices, principal components analysis (PCA), and Tasseled-Cap (T-C) transformation were carried out to obtain 9 vegetation indices, 3 PCA components, and 3 T-C transformation components.The vegetation indices are shown in Table1.The optical bands, indices, and components form Landsat 8 were combined as input data for the RF model.

Table 1 .
The indices used in Landsat 8.After sampling all 13 channels to a resolution of 10 meters on GEE, the required values for the FCN model input were obtained through image filtering, cloud removal, and fusion processing.
The FCN model requires 12 bands that are consistent with the above as model inputs, and a Scene Classification (SCL) map was used for assistance.

Table 2 .
Results of the accuracy assessment.