Estimating Forest Stock Volume Based on Airborne Lidar Data

Forest stock volume (FSV) stands as an important indicator in evaluating the potential for carbon sequestration. It is crucial for forest resource management at local, regional, and national scales. In order to achieve an accurate estimation of FSV, this article takes Mengyin County, Shandong Province, China as the research area, builds a random forest (RF) model for four tree species based on airborne Lidar data, and forms a monitoring system of "individual tree - grid - county" granularities. The results demonstrated that all four models exhibited excellent generalization capabilities, with no signs of overfitting. In the test phase, the R² of the poplar and pine models exceeded 0.9, while the R² of the cypress model was 0.81, and the rRMSE was controlled within 20%, indicating that the fitting effect of the three tree species models was better; the accuracy of the robinia pseudoacacia model was relatively poor, with R² of 0.60 and rRMSE of 20.60%. This study provides a feasible method for estimating forest stock volume within the county, which provides strong technical support for forest resource management and planning, and helps promote sustainable forestry development.


Introduction
Forests are the mainstay of terrestrial ecosystems, their annual carbon sequestration accounts for about 2/3 of the entire terrestrial ecosystem (Post et al. 1982;He et al. 2022).They play an irreplaceable role in regulating global carbon balance, mitigating the greenhouse effect and tackling climate warming (Doelman et al. 2019).Accurately estimating the carbon sequestration potential of forests has important guiding significance for formulating action plans for addressing global climate change, increasing sequestration and reducing emissions under the carbon neutral target.
FSV is one of the most important indicators for assessing carbon sequestration potential (Hu et al. 2020).Traditional methods for calculating above-ground forest stock volume mainly rely on field survey data, which have high reliability (Liu et al. 2018).However, with these methods, it is cumbersome and difficult to implement dynamic monitoring of forest carbon stocks on a large regional scale.This kind of tasks is suitable for methods based on remote sensing data, which has the advantages of large scale, non-contact, multi-temporal, and high spatial resolution.
Traditional optical remote sensing techniques have certain limitations in monitoring forest resources.They mostly only provide texture and spectral information of the upper canopy of forests, and there exists a problem of easy saturation of spectral signals (Duncanson et al. 2010;Lu et al. 2012).The emergence of Lidar technology has effectively alleviated this issue.In particular, LiDAR data has a certain penetration ability for forests and can accurately describe the three-dimensional structure of the forest canopy (Nelson et al. 1984;Wilkes et al. 2018).Lidar data has been extensively utilized in research on the inversion of FSV.Yuan et al. (2021) combined airborne laser point cloud data with 800 ground sample plots to establish a stock volume model for these four coniferous forests using stepwise regression and partial least squares regression.The results showed that partial least squares regression was superior to stepwise regression.McRoberts et al. (2012) assessed the utility of lidar-based stratifications for mean growing stock volume per unit area.The results indicated that the stratifications based on nonlinear logistic regression model predictions of volume obtained from lidar data reduced variances of mean growing stock volume estimates.Liu et al. (2023) used a multiple linear to explore the relationship between forest stock volume and multi-source remote sensing features.This study explored an effective LiDAR sample collection scheme for estimating forest stock and can provide a reference for future LiDAR sample collection.However, the majority of these studies tend to rely on either multiple linear regression models or nonlinear regression models.RF algorithm does not require statistical assumptions or predetermined model parameters.This algorithm can effectively handle nonlinear, interactive and collinear problems, while effectively avoiding overfitting (Sun et al. 2021).At present, RF has been applied in forest growth and forest carbon storage prediction (Mina et al. 2018;Jevšenak and Skudnik 2021;Tian et al. 2022).But, there are relatively few studies on the prediction of FSV using RF models for large-scale and multi-factor effects.
Mengyin County is located in southeast central Shandong Province, China.With a high forest coverage rate, complex terrain and large carbon sequestration potential, it is a representative and suitable region for forestry carbon sequestration monitoring researches using remote sensing technology.Therefore, this article takes Mengyin County as the research area, conducts an airborne Lidar data FSV inversion experiment based on the RF model, and evaluates the applicability and possibility of this technology in the inversion of forest stock volume in Shandong Province.It provides a reference for the application of new technologies in forest resource survey in the future.

Study area
The study is situated in the Yimeng Mountain Area in southeast part of Shandong Province, in Mengyin County, in the coordinates 117°45′-118°15′E and 35 °27′-36°02′N (Mengyin County Bureau of Statistics, 2023).The total area of Mengyin County is 1,601.6km², characterized by high-altitude terrain in the north and south and low-altitude terrain in the middle.The area has a warm temperate continental monsoon climate with four distinct seasons.The mean annual temperature is 14.4℃, and the mean annual precipitation is 813.8mm.
The main dominant tree species in Mengyin County are poplar, pine, cypress, and robinia pseudoacacia, accounting for more than 95% of the county's forest area.The location of the study area and the distribution of main dominant tree species can be found in Figure 1.

Lidar data
The airborne Lidar data mainly comes from the Shandong Provincial 14th Five-Year Basic Surveying and Mapping Planning project, with a point density of 1pts/m² .The sensor system is CityMapper-2L and the laser model is Hyperion2.The data's currency spans from April to June 2023.

In situ sample plot data
The sample data was obtained by measuring the tree height, diameter at breast height (DBH) and location of the trees, taking into full consideration the information such as tree species, age group, and topography.
This study measured 71 plots with an area ranging from 1000 to 1500 square meters.The tree species in the plots are four dominant tree species, containing more than 8,800 trees.

Map of forest resources inventory data
Map of forest resources inventory data comes from forestry departments.The data is current as of 2021.Based on this data, the vegetation coverage and types of tree species can be derived.

calculation of sample plot FSV
In this research, a 20*20meter grid system is selected as the unified analytical units.The stock volume of each tree was calculated based on the "Timber Volume Then, the total volume of all trees contained within the standard grid range is calculated.Finally, 163 standard grid samples with FSV parameters were formed, including 36 poplars, 43 pines, 49 cypresses, and 35 robinia pseudoacacia trees.

Forest characteristic parameters extraction
After point cloud rough classification, noise removal and vegetation reclassification, original Lidar data are ultimately transformed to normalized point cloud data.Among them, the high-vegetation points after vegetation reclassification are considered as trees.
Based on normalized point cloud data, forest characteristic parameters are calculated according to a 20*20meter grid, mainly including height variables, density variables and vertical structure variables (Table 1).This article extracts totally 34 forest characteristic parameters.

H_max
The maximum high-vegetation height H_min The minimum of high-vegetation height H_mean The average of high-vegetation height H_stdv The standard deviation of high-vegetation height H_var The variance of high-vegetation height hp (hp25, hp50, hp75, Table 1.Forest characteristic parameters (Remark: where n is the high-vegetation numbers in each grid,   refers to the height of the i-th high vegetation point in the grid)

Model construction
The regression model is create using RF here in this paper, a popular machine learning method.The RF model is an ensemble learning method based on decision trees.It constructs a series of base learners through resampling, combines the prediction results of these base learners, and outputs the final prediction (Zhou 2016).It has the ability to solve both regression and classification problems.The prediction formula for the RF regression model can be expressed as follows: Where (,   ) represents the t-th decision tree model,   is a random variable that follows an independent distribution, x is the independent variable, and T is the number of decision trees.
The main works of volume modelling are: (1) choosing characteristic variables.Pearson's correlation analysis is conducted between 30 forest characteristic variables and stock volume, and the parameters with higher correlation index (≥0.6)are selected as the characteristic variables of the random forest model.
(2) creating training set and test set.The sample data are randomly divided into training set and testing set, 80% of them belong to the former and 20% belong to the latter.(3) the optimal combination of hyperparameters of the random forest model is determined by grid searching strategy.( 4) building the models.
Considering the heterogeneity of different tree species, this paper establishes models for the four dominant tree species respectively.

Model assessment method
Two statistical indicators, the coefficient of determination(R² ) and the relative root mean square error(rRMSE), were chosen to assess the performance of the RF model.The formula is as follows. (2) Where   represents the measured FSV, y ̅ is the mean measured FSV,  ̂ is the predicted FSV, i is the same index, and n is the grid sample plots number.

Model assessment results
In this paper, the fitting quality of the models was tested by the R² and rRMSE.The results (Table 2) shown that the four models had good generalization ability and there's no overfitting.In the training phase, the pine model had the best performance with the highest R² =0.97 and smallest rRMSE=10.26%.The poplar model was second with an R² =0.95 and rRMSE=11.19%.This model was followed by the cypress model with an R² =0.88 and rRMSE=13.81%.The robinia pseudoacacia model had the worst performance with the smallest R² =0.61 and highest rRMSE=19.36%.In the test phase, it was clear that the pine model also performed the best among the four models (R² =0.94, rRMSE=18.10%);the next best was the poplar model, with an R² =0.91 and rRMSE=15.25%;the cypress was third with an R² =0.81 and rRMSE=18.30%;the robinia pseudoacacia model performed the worst, with the smallest R² =0.60 and highest rRMSE=20.60%.

Map of the FSV estimation
Through the map of forest resources inventory data, the forest area of Mengyin County was extracted, with a total of 592,169 20*20m grids data.According to the experimental results, the stock volume in Mengyin County was mainly distributed in the southwest and northeast regions.The major types of landforms of the two regions were low mountains and hills.The Figure 2 shown that the FSV in Mengyin County ranged from 0.97m 3 to 10.21m 3 .Among the tree species, poplar had the largest total stock volume, accounting for 44% of the total; pine was second, accounting for 40%; cypress accounted for 10%; and robinia pseudoacacia accounted for 6%.

Spatial pattern analysis of FSV
After the entire FSV in Mengyin County was estimated, its spatial pattern was analyzed.The distribution of the stock volume, including its direction and range, was described using the standard deviation ellipse, and its spatial autocorrelation was described using Moran's I.
The standard deviation ellipse is a spatial statistical technique for measuring the distribution pattern of geographical elements (Liu et al. 2021).It can be used to analyze the spatial distribution characteristics of FSV in through parameters of standard deviation.The Figure 3 illustrated that the ellipse centered at 118.0086°E, 35.6976°N, with a major axis of 24.4km and a minor axis of 13.6km, can cover about 60% of FSV in Mengyin County.
The direction of the principal axis of the standard deviation ellipse for FSV in Mengyin County was oriented at 14° north by east, suggesting a primary concentration of storage volume within the county towards the southwest-northeast axis.

Limitations and future work
The next step in this paper's research involves several crucial aspects.Firstly, during the field survey, the sample plots collected were mostly high-accumulation plots, and there were few samples with low accumulation.Therefore, the minimum estimated accumulation was relatively high.So we plan to expand the dataset by collecting more representative samples, especially for poplars and robinia pseudoacacia.This enhancement will ensure a more comprehensive and robust result for FSV.Secondly, while the current study focuses on evaluating the model's inherent accuracy, it lacks validation against thirdparty data.As a result, we intend to conduct a comparative analysis with external datasets to assess the model's reliability.This step is crucial for validating the model's performance in realworld scenarios.Additionally, we will explore potential improvements to the model's architecture and hyperparameters to further enhance its accuracy and efficiency.Overall, these works aim to strengthen the reliability and applicability of our model for estimating FSV in Mengyin County.

Conclusion
This study introduced a method for estimating FSV which jointly used low-density point cloud data and the random forest modelling algorithm.This method can greatly save manpower and material costs compared to traditional forestry estimation methods.Establishing models for individual tree species made the models more targeted and applicable, laying a solid foundation for calculating biomass and carbon storage in the next work.

Figure 1 .
Figure 1.Location and the main dominant tree species in the study area.

Figure 2 .
Figure 2. Map of the FSV estimation in Mengyin County

Figure 3 .
Figure 3.The distribution of standard deviation ellipse of FSV Global Moran's I of FSV in Mengyin County was calculated using ArcGIS's toolbox.The global Moran's I value stood at 0.77, with a p value below 0.01 and a Z score significantly exceeding 1.65, strongly suggesting that there was a notable autocorrelation present in the FSV of Mengyin County.In order to more intuitively analyze the spatial distribution pattern of the FSV, the local Moran's I was calculated in the experiment; the clusters were shown in Figure 4.The high-high values were mainly distributed in the southern mountainous areas, forming distinct contiguous zones; the lowlow values were distributed in the northern, central, and southwestern regions, forming distinct surface agglomerations locally.The distribution of low-high values and high-low values failed to exhibit any distinct clustering, instead, it was solely scattered around the peripheries of regions dominated by highhigh values and low-low values.

Figure 4 .
Figure 4. Cluster types of estimated FSV

Table 2 .
Accuracy assessment of four dominant tree species models.