Development of a Precise Tree Structure from LiDAR Point Clouds

A precise tree structure that represents the distribution of tree stem, branches, and leaves is crucial for accurately capturing the full representation of a tree. Light Detection and Ranging (LiDAR)-based three-dimensional (3D) point clouds (PCs) capture the geometry of scanned objects including forests stands and individual trees. PCs are irregular, unstructured, often noisy, and contaminated by outliers. Researchers have struggled to develop methods to separate leaves and wood without losing the tree geometry. This paper proposes a solution that employs only the spatial coordinates ( x , y , z ) of the PC. The new algorithm works as a filtering approach, utilizing multi-scale neighborhood-based geometric features (GFs) e.g., linearity, planarity, and verticality to classify linear (wood) and non-linear (leaf) points. This involves finding potential wood points and coupling them with an octree-based segmentation to develop a tree architecture. The main contributions of this paper are (i) investigating the potential of different GFs to split linear and non-linear points, (ii) introducing a novel method that pointwise classifies leaf and wood points, and (iii) developing a precise 3D tree structure. The performance of the new algorithm has been demonstrated through


Introduction
Understanding tree topology, including the separation of leaf and woody materials (e.g., stem, branches, and twigs) has important implications for forest-urban management, ecosystem functions, and tree information modeling (Wang et al., 2020;Hui et al., 2021;Shu et al., 2022;Tian and Li, 2022).This involves estimation of precise tree parameters such as diameter at breast height (DBH) [Nurunnabi et al., 2024], canopy profile (Zhao et al., 2015), leaf angle (Zhao et al., 2015), stem volume, and leaf area index (LAI) [Beland et al., 2014;Li et al., 2017].Additionally, it encompasses the quantification of above-ground biomass (AGB), and carbon sequestration capacity (Claders et al., 2015).Image-based methods are not well-suited for accurately assessing the 3D geometry of objects because objects may appear mixed or overlapped within two-dimensional (2D) image pixels, thereby limiting the ability to accurately assess geometry.Conversely, Light Detection and Ranging (LiDAR)based scanning systems-Terrestrial, Mobile, and Aerial Laser Scanning (TLS, MLS, and ALS)-provide 3D georeferenced points called point clouds (PCs) that capture detailed geometry of the scanned objects.TLS in forestry and ecology allow individual tree parameters estimation and forest stands structural measurements (Raumonen et al., 2013;Zhao et al., 2015;Tian and Li, 2022).PC processing is primarily challenging due to its unstructured, unlabelled, and inhomogeneous nature, and the presence of outliers (Nurunnabi et al., 2012).Additional difficulties emerge from the mixed and occlusion effects caused by finer branches, tiny twigs, and leaves (Vicari et al., 2019;Hui et al., 2021).
Radiometric features pertain to the intensity and reflectivity of laser returns at specific wavelengths.The availability and reliability of radiometric information, which encompass color and intensity, may be impacted by ambient lighting.The intensity of laser points is contingent upon variations in their optical properties at the LiDAR sensor's wavelength (Béland et al., 2014).Furthermore, the intensity of laser returns is sensitive to both distance and the incidence angle to the scanned objects, making radiometric calibration of intensity challenging (Côté, et al.,2011).In fact, Tao et al. (2015) asserted that the intensitybased approach is unsuitable for dense canopies due to the potential generation of misleading intensity values from partial hits.The presence of a dense canopy can also increase the occurrence of mixed returns.In contrast, GF-based methods mainly consider the coordinates (x, y, z) of the points (including partial returns) and utilize the underlying geometric patterns (features) of points based on their local neighbors.When compared with radiometric feature-based approaches, geometric feature-based approaches often yield better classification accuracy (Vicary et al., 2019;Wang et al., 2020).Some methods use a combination of both radiometric and geometric features to leverage the respective benefits of each feature (Zhu et al., 2018).Grau et al. (2017) pointed out that when utilizing point clouds in a volumetric sense, considering the mixture of materials can be critically important.
With complex tree structures, many methods fail to expose finer branching structures and misclassify tiny twigs as leaf points (Hui et al., 2021).The mislabelling of twigs as leaves may inaccurately inflate AGB and LAI values (Claders et al., 2015).To date, researchers have struggled to develop methods that efficiently distinguish leaf from wood points without sacrificing geometric accuracy (Wang et al., 2020;Hui et al., 2021).Hence, the primary hurdles in this endeavor include ensuring sufficient accuracy, robustness, and reliability.
The approach proposed herein avoids radiometric information, employs only GFs at a multi-scale neighborhood level.This involves finding potential wood points and coupling them with an octree-based segmentation to develop the tree architecture.The main scientific contributions of this paper are (i) benchmarking potential GFs to separate linear and non-linear points, and (ii) developing a pointwise method to classifies tree leaf and wood points for a detailed tree architecture.
The remaining paper has 4 sections.Section 2 briefly presents related literature.Section 3 introduces the algorithmic methodology.Section 4 applies the algorithm with two realworld TLS data sets, and Section 5 concludes the paper.

Related Literature
As previously mentioned, features for leaf-wood separation and 3D tree structure development using PCs have been radiometric feature-based, geometric feature-based, or a combination.One of the first effort was done by Côté et al. (2009), who aimed to reconstruct a 3D tree architecture by segmenting the PC between wood and leaf components based on intensity information.Béland et al. (2011Béland et al. ( , 2014) ) used the intensity returns for woodleaf separation.
An early notable GF-based contribution is the work by Raumonen et al. (2013) who focused on tree branch segmentation via flexible cylindrical model to represent both trunk and branches elements.The reconstruction utilizes neighbor relations, as well as geometrical properties.Contemporaneously, Belton et al. (2013) applied GFs for developing a Gaussian mixture model (GMM) that classifies tree points into leaves, trunks and unknown.Subsequently, Tao et al. (2015) developed a geometric-based wood-leaf separator and favorably benchmarked it against an intensity-based approach.Continuing in that spirit, Ma et al. (2016) calculated a set of GFs for another GMM for the classification of photosynthetic (i.e., leaf, flowers, etc.) and non-photosynthetic (wood; stem, branches, etc.) components.Similarly, Wang et al. (2017) assessed 26 GFs via 4 machine learning methods: support vector machine, naive Bayes, random forest (RF), and GMM.The authors concluded that RF was the most accurate classifier.Similar efforts were undertaken by Ferrara et al. (2018) used GFs and the density-based clustering algorithm, DBSCAN.Zhu et al. (2018) employed an adaptive neighborhood search algorithm as an alternative approach using multiple radiometric and geometric features for classification using a RF classifier.They concluded that GFs are more important than radiometric features.
More recently, a generic and fully automatic leaf-wood separation (LeWoS) procedure was introduced by Wang et al. (2020) utilizing GFs.The work employed a set of recursive point cloud segmentation and regularization procedures.
In another novel approach, Tian and Li (2022) developed a graph-based leaf-wood separation (GBS) method.GBS utilized the shortest path-based features in tree point cloud segmentation, cluster recognition and region growing (1) using a weighted undirected connected graph that is constructed based on a single tree point cloud; (2) then segmenting that into homogeneous clusters at multiple scales; (3) next finding cylindrical and/or linear characteristics; and (4) using points on those elements as initial seed points to find irregular clusters collected using a region-growing-based approach.To demonstrate their method.multiple tree species and sizes were tested.
Identifying the shape and topological structure of a tree has also been a longstanding topic of interest.Early on, Bucksh and Lindenbergh ( 2008) proposed an octree-graph based method for tree skeleton extraction, while Fu et al. (2020) employed level sets and cylindrical shapes.An approach developed by Li et al. (2022) to generate refined tree skeleton include separation, oversegmentation, and shrinkage of main branches.Notably, Hui et al. (2021) developed a hybrid method that combines graph-based and GF-based methods.
Recently, deep learning (DL) methods (Garcia-Garcia et al., 2017) have emerged for various PC processing tasks, such as segmentation and classification, which hold significant implications in forest analysis.Zou et al. (2017) introduced a DL-based algorithm for tree classification, leveraging low-level feature representation through voxel-based rasterization.Xi et al. (2018) employed a 3D fully convolutional neural network for filtering tree stems and branches.

Methodology
Herein a 3-step algorithm for wood and leaf separation is proposed.The first two steps aim to overcome problems related to ground points, noise and outliers.The flowchart of the new algorithm is shown in Fig. 1.

Step 1. Ground points filtering
For this, the ground filtering algorithm proposed by Nurunnabi et al. (2016) is employed, which couples robust statistical approaches with locally weighted regression.For this method, the data set is divided into manageable strips.The algorithm is executed on each strip, and the results are subsequently merged.For each strip, the algorithm operates iteratively on two orthogonal profiles (x-z and y-z).The iterative process consists of two main steps.In the first step, robust locally weighted regression (RLWR) is employed locally to obtain a nonlinear fit for the entire strip.The second step involves iterative fitting for each strip, where the z (elevation) values are down-weighted based on the residuals of the previous fit.The process of fitting and down-weighting continues until it reaches the ground level, and the ground points are identified as those that are close to the ground level within a specified threshold.This method is highly robust in the presence of outliers.The reader is referred to Nurunnabi et al. (2016) for additional details.

Step 2. Denoising and outliers removal
Denoising and outlier removal are achieved by applying a statistically robust algorithm developed in Nurunnabi et al. (2015).In that, a maximum consistency with minimum distance (MCMD)-based Z-score is employed as a well-known distancebased statistical measure that often used to identify outliers.It locates outliers locally that are significantly distant from the maximum consistent set (MCS) within a local neighborhood.The MCS was developed by utlizing a PCA-based plane fitting.For more detailed information about the algorithm, the reader can refer to the original paper (Nurunnabi et al., 2015).

Step 3. Identification, classification and reconstruction of linear (wood) and non-linear (leaf) structures
This is the main component of the proposed algorithm and the predominant contribution of this paper.This step includes separating wood (linear) and leaf (non-linear) points to uncover the distribution of tree stem and branches and, ultimately, the tree skeleton.The tree skeleton can be articulated as a non-linear structure, which is mainly a combination of the tree stem and branches that are actuallly linear structures within a local region.To establish this, an iterative process of (i) classification and (ii) segmentation is employed.

Task (i), Geometric feature estimation and classification:
Principal Component Analysis (PCA) is a wellknown (Brodu and Logue, 2012;Böhm, et al., 2016) simple statistical method that can identify the related directions of points (in a cloud) in a local neighborhood.Here PCA is performed on the covariance matrix () of the local neighbors for each point of a tree PC (P).The matrix  of k points in a neighborhood of a point of interest   (, , )   can be defined as: where is the mean of the points in the local neighborhood.To perform PCA, the following eigenvalue equation (Eq.2) is solved using the well-known singular value decomposition (SVD) technique, where V is a matrix containing the eigenvectors as its columns, and  is a diagonal matrix that contains eigenvalues in its  We surveyed the relevant literature (e.g., Hackel et al., 2016;Ma et al., 2016;Nurunnabi et al., 2021) to identify the most informative GFs for detecting points that predominantly exhibit linear patterns, such as those found for wood (stem and branches) points.By employing formulas for GFs, namely linearity (L), planarity (P), and verticality (V) as outlined in Table 1, we are able to effectively extract wood points representing linear, planar, or vertical structures.
Brodu and Lague ( 2012) noted that by integrating information from various scales, creates signatures capable of identifying categories of objects.Wang et al. (2015) demonstrated that multiscale features effectively extract shape features from complex PCs.They asserted that multiscale and hierarchical features are discriminative and robust, particularly when dealing with PCs exhibiting varying point density and missing data.
Due to the variation in tree species, branch/stem sizes, and point density, we estimate GFs based on multiple sizes of the neghborhood Radius (nR).In Fig. 3 we have a motivating finding: points on a stem (which is typically linear) can be represented as linear/planar and vice versa, depending on their position and the size of the local neighborhood.Although we consider L, P, and V, our primary focus is on L. We estimate multi-scale neighborhood based L values to extract the majority of wood points from stems and branches.We consider multiple (n) sizes of neighborhood radii, and experienced 5 to 7 different sizes of raddi is good enough to collect the wood points from different parts (stump/stem/branch) of the tree.By the definitions of L, P and S in Table 1, we have L+P+S = 1.Points with significantly higher values of L (or P) compared to others will be categorized as linear (or planar) accordingly.We classify a point as linear, planar, or vertical if their corresponding values of L, P, or V exceed their respective thresholds ( * ℎ ), indicating L ≥  ℎ , P ≥  ℎ or V ≥  ℎ , and consider them potential wood points (PWP).In this paper, we fix  ℎ =  ℎ =  ℎ = 0.60.Points on a finer branch shows as linear with a neighborhood of a smaller radius, whereas a point on a heavier branch is revealed as linear when a larger neighborhood is counted.
To investigate any potential relationship between nR and DBH, we selected 15 trees (Pine, Oak and Far) with varying DBH sizes, ranging from 15.1cm to 86.7cm.To ensure consistency in point density, we subsampled the data with a 0.05m point spacing.
Setting a planarity threshold of greater than 0.70, we identified the sizes of nR for which a point around the DBH of a tree can be classified as a planar point.We generated a line diagram based on the estimated nRs versus respective DBHs.Fig. 5 illustrates that the size of nR has a positive correlation with the DBH of a tree when identifying planarity.Therefore, we posit that DBH can serve as a suitable reference for determining an appropriate size of nR.Users can fix the sizes of nR based on their specific dataset requirements.
Concurring with existing literature, we note that wood points are predominantly identified by L, P, and/or V. Furthermore, we discover that the 3rd eigenvalue,  0 (which quantifies variation along a point normal,  0 ) can aid in identifying wood points, particularly for finer branches.We integrate  0 with a very small threshhold ( 0Th ≤ 0.0005) to identify wood points that might originate from the 3rd and 4th levels of branches.

Task (ii), Segmentation and refinement:
We identify potential wood points (PWPs): linear with multi-scale neighborhood,  1 , …,   , and using  0 values with  1 , also planar and vertical points with  1 in Task (i).For each nR, after collecting potential wood points (linear, planar and vertical), we perform an octree (voxel)-based connected component segmentation (CCS) [e.g., Vo et al., 2015] to find connected components and filter out leaf points that are wrongly identified as wood points, logically leaf points are scatterded and mostly disconnected as a linear component.Identifying PWP before implementing CCS facilitates breaking the connection between wood and leaf points.Otherwise, the strong nexus between wood and leaf points impedes effective segmentation.We disregard segments that have few (e.g., 200) points.In the final step, we consolidate wood points identified from all pairs of classification and CCS, eliminated duplicates, and performed a final round of CCS.The outcomes were then categorized into wood and leaf.

Methods and metrics used for comparison
In this section, the effectiveness of the proposed method is benchmarked (quantitively and qualitatively) with two TLS data sets against two recently developed approaches: LeWoS (Wang et al., 2020) employing a geometric approach and GBS, a graphbased method developed by Tian and Li (2022).Those authors asserted that their methods offer automatic functionality and high adaptability across various tree species and sizes.
Assessment involved comparing the predicted labels against manual (ground truth) labels and computing performance metrics based on both the manual and predicted labels obtained from various methods.The methods' performance was assessed utilizing four well-known metrics: precision, recall, F1-score, and overall accuracy (OA).These metrics are defined as follows: where True Positive, False Positive, True Negative, and False Negative are abbreviated as TP, FP, TN and FN, respectively.

Experiment 1, Scots pine tree data:
A Scots pine (Pinus sylvestris) tree was selected from Weiser et al. (2022), where the data were collected with a RIGEL VZ-400 TLS system in a German Forest in 2019.Excluding the ground points, the tree in Fig. 6(a) consists of 570,337 points.It is of 28.28m height and DBH 46.2cm.Wang et al. (2020) manually classified the tree data into leaf and wood points, and used the classified data for validation of their method.Later, the tree is used in evaluation for the method in Tian and Li (2022).Similar to Tian and Li (2022), in order to mitigate accuracy bias stemming from irregularly distributed points, minimize computational load, and maintain a consistent distance between adjacent points, we subsampled the data with a point spacing of 0.02m.Also we did some manual classification to be precised.Our method was applied to estimate the GFs (L, P, V and  0 ).We find potential linear points using the L values obtained from six different scales of local neighborhood, nR = 0.05m, 0.1m, 0.2m, 0.3m, 0.4m and 0.5m.Additionally, potential planar and vertical points were identified using P and V values of nR = 0.1m.We calculated  0 of nR = 0.1m with  0ℎ = 0.0005 to find wood points that originating from thin (3rd and 4th levels) brances.Furthermore, we conducted the GBS method (Tian and Li, 2022), and the LeWoS method (Wang et al., 2020).The outcomes are detailed in Table 2 and visually depicted in Fig. 6.We observe that our method consistently produces superior results across various metrics compared to others, boasting an overall accuracy (OA) and F1-score of 97.9% and 94% (wood), respectively.LeWoS, achieves higher OA and F1-scores for both leaves and wood compared to GBS.However, GBS demonstrates significantly better recall for wood points (77.3%) than LeWoS (68.7%).Fig. 6 (d) reveals that LeWoS misclassifies numerous wood points (depicted in red) as leaves, while in Fig. 6 (g), GBS incorrectly identifies many leaf points (green) as wood.Additionally, as illustrated in Fig. 6 (i), GBS fails to recognize numerous stem points (in the magenta ellipse) as wood points.(Takoudjou et al., 2018).The tree comprises 462,932 points with a height of 43.68m and a DBH of 116.6cm.Wang et al. (2020) employed this data set to assess the LeWoS algorithm.In their study, Wang et al. (2020) manually categorized the data into leaf and wood points.Subsequently, following Tian and Li (2022), we downsampled the data with a point spacing of 0.05m.We conducted additional manual corrections to rectify some misclassifications in the data set.The tree data were classified as leaf and wood points of 37,3985 and 88,947, respectively.Our method was performed on the Okan tree data along with the GBS (Tian and Li, 2022), and LeWoS (Wang et al., 2020).Likewise the first experiment, we find potential linear points using the L values obtained from 5 different scales of local neighborhood, nR = 0.2m, 0.4m, 0.6m, 0.8m and 1.0m.Potential planar and vertical points were identified using P and V values of nR = 0.2m.We calculated  0 of nR = 0.2m with  0ℎ = 0.0005 to find wood points that originating from thiner brances.The results are presented in

Conclusions
In this paper, we introduce a method for individual tree leafwood classification based on GFs that relies solely on spatial coordinates (x, y, z) of each point, without considering any radiometric feature.The algorithm employs classification and segmentation, utilizing only four GFs (L, P, V, and  0 ).While leaf points may be incorrectly identified as wood points during the classification step, the incorporation of multi-scale features provides sufficient discriminative power to distinguish between wood and leaf points.Additionally, connected component segmentation (CCS) is employed to reassign misclassified points to their appropriate groups.
Precise wood classification enables the development of an accurate tree structure (skeleton), holding promise for 3D tree modeling.This new method shows potential for comprehending the biophysical processes of trees and estimating various tree parameters such as wood volume, height, and leaf area index.Notably, the method is independent to tree species and size.However, a limitation lies in the necessity for users to determine appropriate sizes for the multiple local neighborhoods to estimate geometric features (GFs).However, neighborhood sizes can be fixed based on the DBH information of a specific tree.Future research will focus on further separation of tree twigs and leaves, as well as reconstructing the irregular geometric shapes of tree stems and branches to enhance the precision of tree modeling.We aim to incorporate data from more intricate environments and mixed forests to broaden the applicability and robustness of our method.
diagonal.The results yield three eigenvectors  2 ,  1 and  0 , along with three respective eigenvalues  2 ,  1 and  0 ( 2 ≥  1 ≥  0 ≥ 0).The eigenvectors are known as the principal components.They represent the directions of maximum variance in the data (neighbors).These directions with corresponding eigenvalues can correspond to the main GFs.The 1st principal component ( 2 ) explains the direction of the most data variability, and the 2nd and 3rd principal components progressively less.Thus, PCA highlights the dominant patterns and structures within a PC.Fig.2explains how, a PC geometrically appears at a specific position within a local neighborhood (with a given scale), indicating whether a point resembles a volumetric shape, a line, or a plane.If all three eigenvalues are relatively equal, the points are spread out almost evenly in all directions, suggesting a volumetric or spherical shape.If one eigenvalue ( 2 ) is significantly larger than the others ( 1 and  0 ), it indicates that points are more along one direction, thus suggesting a linear shape.If two eigenvalues ( 2 and  1 ) are significantly larger than the third ( 0 ), it suggests that points are concentrated along a plane, indicating a planar structure.

Figure 2 .
Figure 2. (a) a spherical neighbourhood, points are scattered, (b) points represent a linear surface, and (c) points represent a planar surface.
Fig 4. illustrates that how L values from multi-scale neighborhoods show linear structure from different parts of a tree.

Figure 3 .
Figure 3. (a) A point   on a part of a stem (linear structure) is identified as a planar point with nR = 0.1m (green area), and as a linear point when nR = 0.5m (cyan area), (b) line diagram shows impact of neighborhood size on GFs.

Figure 4 .
Figure 4. Search for wood (linear/planar/vertical surface) points, (a) linear (red) points from 3rd level branch; nR=0.05m,(b) linear (red) points from 1st level branch; nR=0.3m,(c) vertical (red) points from stem; nR=0.1m.Points on a finer branch shows as linear with a neighborhood of a smaller radius, whereas a point on a heavier branch is revealed as linear when a larger neighborhood is counted.

Figure 5 .
Figure 5. Line diagram of neighborhood radius, nR versus DBH for 15 trees.

Figure 6 .
Figure 6.(a) Scots pine tree point cloud.Manually classified points: (b) wood, (c) leaf.Classification results of LeWoS are in (d), (e) and (f): (d) blue points are correctly classified wood points, red and green points are wrongly identified leaf (FN) and wood (FP) points, respectively, identified points are (e) wood, and (f) leaf.Likewise, LeWoS with similar order, classification results of GBS are in (g), (h) and (i), and our results are in (j), (k), and (l).

Table 2 .
Classification results of Scots pine tree data using LeWoS, GBS, and our methods.OA (overall accuracy).

Table 3 .
Classification results of Okan tree data using LeWoS, GBS, and our methods.