Exploitation of the Number of Return Echoes for DTM Extraction from Point Clouds Acquired by LiDAR UAS DJI Zenmuse L1

Following the enormous technological developments of LiDAR (Light Detection And Ranging) sensors, it is currently easier to find them commercially in the UA (Uncrewed Aerial Systems) sector. In particular, with the Zenmuse L1 by DJI (Dà-Jiāng Innovations) the market has grown globally, mainly due to the compactness of the product that is easily compatible with UAS. The L1 sensor can record up to three returns of the emanating signal, so it can acquire a larger amount of points, such as those below the vegetation. Therefore, in addition to the geometric information of the points, the Zenmuse L1 point clouds also provide other information, such as the number of echo returns from 1 to 3. This data could be exploited to improve the automatic extraction of the digital terrain model (DTM) from the point clouds, hopefully leading to the avoidance of manual correction. This research aims to focus on evaluating whether the addition of the return number feature can affect the identification of the ground points through different computational methods and can improve the time efficiency of state-of-the-art algorithms.


Introduction
The advent of Light Detection and Ranging (LiDAR) technology has revolutionized the way we perceive and understand landscapes.LiDAR systems capture vast amounts of data, generating highly detailed three-dimensional point clouds.In recent years, with the DJI solutions on the market, LiDAR systems have become very compact and affordable, guaranteeing an even higher level of detail thanks to the possibility of mounting it on a Unmanned Aerial Vehicles (UAV).The soobtained point clouds serve as a rich source of information, invaluable for various applications ranging from urban planning to environmental monitoring.However, within these dense point clouds lies a wealth of nuanced data that requires careful interpretation and extraction.One crucial aspect that significantly influences the accuracy and utility of derived products, such as Digital Terrain Models (DTMs), is the return number associated with each LiDAR point.Return numbers, or echoes, categorize individual points based on the number of times the laser pulse is reflected back to the sensor.These can range from first returns, representing the surface of objects like trees and buildings, to multiple returns, indicative of complex terrain features or vegetation layers.While all returns contribute to the overall point cloud, understanding and leveraging the return numbers could be paramount, particularly in DTM extraction processes, since DTMs need to be devoid of above-ground features such as vegetation and structures.Accurate DTMs are essential for applications like flood modelling (Alho et al., 2009), viewshed analysis (Maloy et al., 2001), and slope stability assessment (Subramaniam et al., 2022).The precision of these models hinges on the ability to distinguish between ground points and non-ground points within the LiDAR point cloud.In this context, the utilization of return numbers emerges as a critical factor in refining DTM extraction methodologies.By differentiating between first returns, which predominantly represent ground surfaces, and subsequent returns, which may signify vegetation or man-made structures, it is possible to enhance the accuracy of terrain modelling processes.Leveraging the information encoded in return numbers enables the development of robust algorithms tailored to isolate ground points with greater precision, thus yielding more accurate and reliable DTMs (Chen et al., 2017;Hyyppä et al., 2005;Štroner et al., 2023).In particular, in this contribution, we would like to investigate, on one side, if it is possible to use the return number to improve the computing time of state-of-the-art morphological algorithms, on the other hand if it could be considered effective for the point cloud semantic segmentation through Machine Learning (ML) algorithms such as the Random Forest.This investigation was carried out on the data derived from the previously mentioned DJI Zenmuse L1 sensor, which has as one of the most interesting features, the integration between the LiDAR, the RGB camera and the IMU (Inertial Measurement Unit) module in one single hardware mounted on a 3-axis stabilized gimbal.Table 1 shows some specifications (Teppati Losè et al., 2022)

The case study
Three different methodologies have been applied to evaluate whether the addition of the return number information improves the discrimination of the ground points.Each methodology is based on a different classification strategy and focused on investigating different insights.The case study depicts a mountain area in the west Italian alpine region with slopes and different kinds of patterns where small and medium-sized shrubs alternate with large areas of both flat and noisy ground (Figure 1).A small testbed area was selected out of the whole point cloud for the preliminary tests and to set up and validate the algorithms.It included only three categories (ground, road and vegetation) and it counted 651.898 points out of 53.303.310;we will refer to it as Sample 1 (Figure 2).
A second wider sample (Sample 2) was then selected, with 8.679.442 points and additional categories such as buildings and electricity poles (Figure 3).Both the point clouds had the standard features directly derived from the Zenmuse L1 LiDAR: RGB, Scan angle rank, GPS time, number of returns and intensity.
Table 2 shows the main features of the two samples.

Methodology and results
Three algorithms were tested, each of which defined a different method and approach to investigate whether the information given by the return number could be useful to extract the ground points.These methods were applied to both S1 and S2, and a confusion matrix was calculated to evaluate the performances.In addition, the processing time was also recorded to compare the timing.Methods 1 and 2 were developed in MATLAB, involved the geometric structure of the point clouds and were compared to assess the effectiveness of Method 2, while Method 3 consists of the Random Forest ML classifier.

Method 1: simple morphological filter algorithm
The first method to be tested (that constituted the reference one) consisted of a simple morphological filter (SMRF) algorithm, which extracts the ground points by evaluating only the geometry of each individual point (Pingel et al., 2013).In this case, return number information is not used.The algorithm consists of three steps: 1) creating a minimum elevation surface map from the point cloud data; 2) segmenting the surface map into ground and off-ground grid elements; 3) segmenting the original point cloud data and subsampling it with a grid-spaced approach.To create a minimum elevation surface map, the process begins by dividing the point cloud data into grids along the xy-plane, the grid size is specified.
Subsequently, for each grid element (pixel), the lowest elevation value (Zmin) is found.To generate a segment surface map, the process is initiated by employing a morphological opening operation on the minimum surface map.This operation entails the application of both an erosion filter and a dilation filter consecutively.The structuring element's shape and its window radius play a crucial role in defining the search neighbourhood for this operation.Typically, a disc-shaped structuring element is utilised, commencing with a window radius of 1 pixel.Following this, the slope between the minimum surface and the opened surface maps is computed at each grid element.The pixel is categorised as non-ground if the disparity surpasses the elevation threshold.These steps are carried out iteratively, progressively increasing the window radius by 1 pixel in each iteration until it reaches the specified maximum radius.
The process unfolds as follows to segment the original point cloud: initially, the binary mask generated earlier is employed on the original minimum surface map to filter out non-ground grids.Subsequently, the vacant grids are filled using image interpolation techniques to construct an estimated elevation model.Then, the disparity in elevation between every point in the original point cloud and its corresponding point in the estimated elevation model is computed.Pixels with discrepancies exceeding the threshold are classified as non-ground.Moreover, the incline of the elevation model at each point is scaled and added to the threshold value.This step aids in pinpointing ground points on steep slopes.Through this iterative process, a binary mask is derived where each pixel of the point cloud undergoes classification as either ground or non-ground based on the segmentation criteria.The algorithm has already been implemented in the function segmentGroundSMRF(..) since MATLAB version R2021a.The argument to be inserted into the function is the point cloud, while the result is a logical vector, where the value 0 represents offground points while 1 represents the ground points.

Sample 1:
For the smallest area, the code executed quite fastly, totalling 8 seconds, and the result was considered satisfactory since higher than 98% (Figure 4).The next step was to compare the result with a manual classification of the point cloud; the results are shown in the confusion matrix CMM1,S1 (Method1, Sample1) Table 3).

Sample 2:
As for S2 (13 times larger than the S1), the code processes fairly quickly for a total of 14 seconds.The result showed still high performances with an overall accuracy of 95%, and with buildings and urban elements properly removed as well as the vegetation (Figure 5).As for S1 a comparison with manual classification was performed, these results are contained in the confusion matrix CMM1,S2 (Table 4).

Method 2: geometry-based approach with return number embedding
The second method is very similar to the previous one, but it adds the return number to investigate the possibility of making Method 1 even faster.In particular, a mask of size n x n is created and iterated through the point cloud as a kernel; then, for each mask, the smallest z value among all the points is stored.All the points that exceed a certain threshold from the minimum z are classified as off-ground.Unlike the previous method, the return numbers data has been partially used: in fact, since the bare terrain areas are mainly characterized by a return number of 1 (as depicted in Figure 2 and 3), the method bypasses kernels with only 1 values, directly associating the ground class.By taking advantage of this feature it is possible to make the first method faster and more efficient, keeping also a higher number of points.However, the potential of the echoes is still not fully addressed and evaluated, thus the third method comes to help.Several tests were performed to find the best combination of the kernel size and the minimum z-threshold; the values to get the best result without a high increasing of the processing time are 5 meters x 5 meters for the mask and 2 meters for the z-threshold.The MATLAB code for method 2 is represented below:

Method 2 -Sample 2:
in this case, the elapsed time is 62 seconds, way higher than the one obtained with the first method.The result is depicted in Figure 7, and the overall metrics are presented in Table 6

Method 3: Random Forest
The following approach investigated the use of machine learning algorithms, such as the Random Forest, to define if the addition of the echoes data does really affect the final classification (Wang et al. 2019).In particular, a set of three tests was carried out for both S1 and S2, while for S2, additional tests were performed.For all the tests, various combinations between the splitting criteria to measure the impurity of the nodes (gini index or entropy), number of trees (100, 150 or 200) and scalers (none, scaler 1 or scaler 2) have been conducted, while the depth of the tree was always set to none.In the following sections, only the best results are reported.As regards the scalers, they have been introduced to better address the presence of 3D features (Weinmann et al., 2015) and they consist in scaler1 which standardizes features by removing the mean and scaling to unit variance, and scaler 2 that scales features using statistics that are robust to outliers (Matrone et al. 2021).In particular, the considered 3D features have been (Figure 8):  is not fully generalizable, on the other side restraining the intrinsic potentialities of this kind of data.
As regards the use of the Random Forest, it emerged that the return number is effectively a useful feature in discriminating the ground class for DTM extraction, even if its coupling with additional features is desirable anyway.
A further development of this research could lie in the ability to understand not if, but how the number of returns influences the final predictions, analysing the possible correlations between the echoes and the classes to be discriminated.

Figure 1 .
Figure 1.Overall point cloud with the subset of points used for the tests (red for Sample 1, pink for the Sample 2).
Sample 1, visualization of the point cloud according to the z-value scalar field (a), result of method 1 (b Sample 2, visualization of the point cloud according to the z-value scalar field (a), result of method 1 (b i)> MXmin && x(i)<MXmax && y(i)>MYmin && y(i)<MYmax %condition inside mask num_punti_maschera=num_punti_maschera+1; %how many points ZM(num_punti_maschera,1)=z(i);%save xyz XM(num_punti_maschera,1)=x(i); YM(num_punti_maschera,1)=y(i); NMM(num_punti_maschera,1)=returnNumb(ias hypothesized Method 2 turns out to be faster, for S1 it processed in 1.3 seconds, almost 8 times faster than Method 1.The result (Figures 6) was compared with manual classification, and the confusion matrix CMM2,S1 and metrics are reported in 0.6 m) (f4) • Planarity (radius 0.6 m) (f5) • Eigenentropy (radius 0.4 m) (f6) • Surface variation (radius 0.6 m) (f7) Scan Angle Rank (a), GPS time (b), Verticality (c), Planarity (d), Eigentropy (e), Surface variation (f).Table 7 contains the configurations of the features selected for the various tests.The third configuration (C3) was added to test the point cloud as restituted by the Zenmuse L1.The fifth configuration (C5), on the other hand, was investigated as a consequence of the results obtained from C3 (see § 2.3.1).The scenes were then divided into training, validation and test, in order to have the presence of all the classes in each dataset (Figure 9) .

Table 7 .
Configuration of the feature selection for the various tests performed.