AN EFFECTIVE APPROACH FOR POINT CLOUD DENOISING IN INTEGRATED SURVEYS

: Outliers and noise in point cloud data are unavoidable due to intrinsic and/or extrinsic survey factors. Significant errors may result from false geometry produced by a collection of anomalies, compounded by sparse structure, irregular densities, and lack of geometric cohesion typical of point clouds nature. Thus, filtering techniques on raw data are required to produce accurate point clouds suitable for further processing. This objective is pursued in the following study through a comparative analysis between two registered clouds, one obtained from TLS, used as reference dataset and the other – to be filtered – from SLAM system. Four steps make up the workflow: analysing the comparison models’ geometric attributes, specifically surface density and roughness; constructing statistical tolerance limits for the TLS cloud’s roughness distribution; cleaning the SLAM cloud; assessing the filtering outcomes. Our efforts to effectively remove and mitigate noise, while preserving the original detail features of the object surface, have been driven by the detailed articulation of point cloud denoising approaches that have been introduced in recent years. However, in this wide context, our goal is not to provide a review or to explore the details of the various methods; rather, we want to offer a simple yet efficient method for obtaining an integrated model with a uniform noise level. This can be especially useful when the data from the survey will later be used in source-based modelling.


INTRODUCTION 1.1 Background
The use of 3D point clouds for object representation is becoming more common in many research areas (Aldoma et al., 2012;Rusu & Cousins, 2011;Saval-Calvo et al., 2015).In contrast to polygonal meshes, point clouds does not require the maintenance of topological consistency (Kobbelt & Botsch, 2004;Pfister & Gross, 2004) and therefore the processing and manipulation of this entities can offer higher performance with less effort.The rapid spread of new mobile acquisition systems integrating profilometers or time-of-flight cameras has promoted the use of these products.However, the presence of noise contamination and outliers can be found (Xie et al., 2004), mainly due to the characteristics of the sensor and how it is integrated into the overall system.Therefore, to generate accurate point clouds, appropriate for additional processing, filtering operations must be performed on raw data, almost always plagued by problems related to intrinsic and extrinsic variables of the instrumentation.Based on these needs, several filtering methodologies have been proposed in recent years, some of which operate directly on the cloud while others require the prior processing of a mesh.References in the literature classify them into seven groups (Han et al., 2017;Schall et al., 2008): those based on statistics (i), which are adapted to point clouds by their nature (Schall et al., 2005); those based on neighbourhood (ii), which make use of similarity measures between points (Rosli & Ramli, 2014); those based on projections (iii), following different strategies (Lipman et al., 2007); those involving the nature of the data acquisition signal (iv) (Linsen, 2001); those using Partial Differential Equations (PDE) (v), widely applied to computer vision (Clarenz et al., 2004); those hybrids (vi) (Liu et al., 2012) and those that do not fall into the previous groups (vii) (Szeliski & Tonnesen, 1992).

Goals
The complex articulation of the available solutions suggests that the topic of filtering is central in a vast range of applications.However, our objective is not to conduct a review or to delve into the specificities of the different approaches but rather to propose a robust and easily replicable workflow to produce reality-based models deriving from integrated survey operations (Alonso et al., 2016;Morena et al., 2021;Limongiello et al.,2020).The combination of systems and sensors now represents a consolidated practice in the field of surveying and this allows us to respond effectively and efficiently to the critical issues and singularities of specific applications.In fact, there is no technique that dominates the others and it is preferable to make up for the limitations of one of them by compensating with the strengths of the others.Those who work in the field of documentation and return multi-scale and multi-resolution models will surely have noticed that the homogenization of the geometric properties of data deriving from heterogeneous sources is particularly burdensome and there is no universally acceptable pipeline to solve the problem.The approaches are in fact related to the use of point clouds, which often constitute input data for subsequent analyses or for source-based modelling (Casillo, Colace, et al., 2022;Casillo, Guida, et al., 2022).The proposed methodology aims to obtain homologous models, produced with different tools, and returned in the form of point clouds, which present optimized noise levels compared to a reference dataset.For our application we operate on a cloud obtained with a GeoSLAM ZEB Horizon mobile system, compared with a Terrestrial Laser Scanning (TLS) model.Framing our solution within the framework defined in the previous sub-section, we can define it as a hybrid approach where, from the roughness distribution of the TLS cloud, we construct statistical tolerance intervals to be used for filtering the SLAM model via a neighbourhood-based algorithm.This is a dispatchable but still accurate approach that allows flexibility in filtering a raw cloud.

Case study
The case study considered to develop this work is an ancient luxurious residential complex, Villa A (the so-called "Villa of Poppea") in the Pompeian site of Oplontis, among the most significant monumental remains buried following the dramatic eruption of 79 AD (Fig. 1).The excavations are in the heart of Torre Annunziata, an urban centre close to Naples in the Campania Region of Italy.Only the "Tabula Peutingeriana", a mediaeval replica of an old Roman road map of Italy, contains references to the name Oplontis, denoting a few buildings between Pompeii and Herculaneum.It was being restored at the time of the eruption, having been built in the middle of the first century BC, and expanded during the imperial era.Its ownership is attributed to Poppaea Sabina, the second wife of Emperor Nero, or otherwise belonging to the imperial family estate.The building has not yet been completely excavated; the area that has been revealed corresponds to the eastern part, and the recovery of the main entrance and the western area is still pending due to the presence of a military building and a modern road.One of the most important villas used for otium on the coast of the Gulf of Naples, the building had a main entrance oriented towards the countryside behind and then developed with a succession of rooms and gardens towards the sea.Overall, the plan of the villa is very complex and still not fully explored to date.

Dataset sources
The reference model is obtained with a laser scanner of the type Continuous Wave -Frequency Modulation (CW-FM) Faro Focus S S150 Plus.For the experiment, we consider a single scan acquired with a resolution of 12.3 mm at 10 m and 4 measurements for each point.The analysed model, on the other hand, comes from a ZEB Horizon mobile GeoSLAM system and from a single path, whose data are processed with a proprietary SLAM algorithm, leaving the default parameters unchanged.The clouds from the two instruments are geolocated (UTM/ETRS00 cartographic system) through planar targets distributed across the scene and detected with a GNSS system.For the SLAM path we also optimise on target coordinates.

Methodology
As anticipated, the aim of the proposed methodology is to refine the noise level of a SLAM point cloud using a homologous model obtained with TLS.The workflow consists of 4 steps: • evaluation of the geometric features of the compared models, in particular roughness and surface density; • construction of statistical tolerance intervals for the roughness distribution of the TLS cloud; • filtering the SLAM cloud; • evaluation of filtering results.

2.2.1
Geometric feature computation: point cloud analyses are conducted in the CloudCompare version 2.12.4 environment considering the two homologous models.For each point of these entities, the roughness value is equal to the oriented distance between this point and the best fitting plane computed on its nearest neighbours selected by imposing a kernel size, i.e. the radius (R) of a sphere centred on each point.The surface density is calculated as the number of neighbours N (identified by defining the radius R of a sphere) divided by the neighbourhood surface.The central point is always used for computing this feature.Therefore, the surface density will be equal to N+1/πR 2 .The analysis of the properties just described is strictly related to the kernel size and therefore the use of a single neighbourhood is incapable of describing the local structure at different scales (Brodu & Lague, 2012;De Blasiis et al., 2020;Harshit et al., 2022).We then conduct a multi-scale assessment to identify the most appropriate value of the radius (R), essential for filtering operations.The same assessment is performed by extracting a substantially flat portion from the two homologous clouds, taking care to control what we define as edge effects.

Construction of tolerance intervals:
after having identified the most appropriate value of the kernel size we take into consideration the roughness distribution of the TLS cloud.Our goal is to construct statistical tolerance intervals for this property and use their limit values to perform SLAM model filtering.These intervals contain a certain percentage of the population with a defined confidence level (Natrella, 2013).The first step is to verify the nature of the distribution.Having calculated the roughness as the oriented distance of a point from the local best fitting plane, we start from the hypothesis that the distribution is normal and perform the Anderson-Darling test, having a sample size greater than 5000 elements.If the hypothesis is verified, we proceed to calculate the normal tolerance limits.Otherwise, we look for a normalization transformation and, if there is an acceptable one, we calculate the normal tolerance limits for the transformed data and then retransform them for the original ones.If this approach also fails, we search for an alternative distribution and, in case of a good fit, we calculate the limits on that distribution.It is essential to remember that in the case of a parametric approach it is essential to verify a posteriori that the sample size is such as to allow the fitting of a specific distribution.If all approaches fail, we use a non-parametric one, taking care to filter the data beforehand to remove outliers (for example by building a box plot).For optimal filtering we construct intervals for different population percentages and confidence levels, comparing the results.We remind you that only in the case of a parametric approach it is possible to check both values in advance; otherwise, we will only be able to fix one and verify the other a posteriori.

2.2.3
SLAM filtering: for each point of the SLAM cloud, the selected algorithm uses a sphere (of which the radius R must be defined) to perform the local fitting of a plane; the point is then removed if its distance from the plane is greater than a Figure 1.Villa A in the site of Oplontis.
predetermined value.For the search radius R we use the value resulting from the multi-scale assessment on the geometric features of the SLAM cloud.Regarding the threshold value, we instead use the tolerance limits calculated for the roughness distribution of the TLS cloud, calculated for an appropriate kernel size.

Filtering evaluation:
the first features we take as reference are certainly the roughness and the surface density of the SLAM cloud, checking how they change after filtering.There is no reference limit value and this type of analysis must be performed in relation to the use of the model (for example production of 2D drawings or source-based modelling).A second approach examines the distance measures between the reference model (TLS) and the analysed one (SLAM).In the literature there are many methods to perform this operation, more or less sensitive to the different sources of uncertainty that we can encounter when comparing point clouds (James et al., 2017;Lague et al., 2013).One of the most refined uses the M3C2 algorithm, which however is poorly suited to the checks we performed.In fact, due to its structure, it is not very sensitive to noise and outliers that can influence the comparison.For this reason, we opt for a direct Cloud-to-Cloud comparison with closest point technique (C2C).This method is the simplest and fastest direct 3D comparison method of point clouds as it does not require gridding or meshing of the data, nor calculation of surface normal.Due to its simple nature, especially suitable for capturing rapid changes in direction between two entities, it is very sensitive to factors such as roughness, outliers, and density, which are central to the investigations we are carrying out.So, we choose this method, placing the TLS cloud as a reference and analyse the SLAM cloud in filtered and unfiltered forms.

Geometric feature computation
The multi-scale assessment is performed with a kernel size ranging from 0.5 to 50.0 cm, a range consistent with the analysis of the textures of architectural elements (Tab.1).For the homologous clouds we extract a surface of approximately 16 m 2 , generally flat, avoiding the junction edges of the walls and floors which would contaminate the analysis.Another critical aspect taken into consideration is represented by the socalled edge effects.In the boundary areas of the extracted cloud portions, the neighbour-search sphere partly falls outside the analysed region and the number of points that will be fitted on a plane decrease as you approach the edge.For this reason, after computing the geometric features, we exclude an area with a width of 50 cm starting from the boundary, corresponding to the maximum kernel size.After these premises we move on to observing the graphs.In the case of SLAM, the roughness and density values depend on the choice of kernel size, especially when the search radius R takes on small values (Fig. 2).For roughness, whose distribution is normal with a mean essentially zero or normalized, we study the link between the standard deviation (σ) and the kernel size.For the surface density, a positive definite quantity, we make no assumptions about the distribution and take into consideration its mean value as R varies.To interpret the data, we use a piecewise polynomial approximation (Fig. 3).In the graphs of both features, a branch with a linear trend is identified, for the smallest values of the radius, and another which can be approximated by a second-degree polynomial.It is worth remembering that the data fitting was evaluated through a coefficient.In the linear case it is the coefficient of determination while in, the polynomial, case it is an adapted version, known as the pseudo-coefficient.The latter certainly has limitations compared to its linear counterpart but can still provide a fairly accurate evaluation.We identify the breakpoint, where there is a strong variation in slope, as the intersection of the functions that approximate the two branches.Precisely at this point we read the reference value of the kernel size; 5.1 cm for roughness and 4.8 cm for density.We believe it is legitimate to assume 5 cm as a reference value for all investigations.For the TLS cloud we observe feature values that are essentially constant as the radius varies.In the case of roughness, we observe a slight increase as R increases, since in the calculation surfaces are approximated as flat which are not.Anomalous values are then observed when R is very small and this is because we work with values lower than the spatial resolution of the scans.To a lesser extent the same phenomenon can also be observed for SLAM.In conclusion, we also use a radius of 5 cm for the TLS.

Construction of tolerance intervals
After having identified the reference value for the kernel size, we proceed with the analysis of the TLS roughness distribution.We first verify that the normal distribution hypothesis is valid using the Anderson-Darling test.Receiving a positive result, we calculate the two-sided tolerance limits.Starting from the mean (m) and standard deviation (σ) of the sample, we can use an interval of the form m ± Kσ.Since both m and σ will vary from sample to sample it is impossible to determine K so that the limits m ± Kσ will always include a specified proportion P of the underlying normal distribution.It is, however, possible to determine K so that in a long series of samples from the same or different normal distributions a definite proportion γ of the intervals m ± Kσ will include P or more of the underlying distribution.In our case the mean value is zero.Table 2 reports the tolerance limits for a confidence value γ = 0.99 and for three different percentages of the population P.

SLAM filtering
Once the tolerance limits been calculated we can establish the threshold values for filtering.Having also defined the reference kernel size, we can then apply the algorithm described in the methodology paragraph.

Filtering evaluation
The first evaluation examines the roughness and surface density.Having set the reference kernel size equal to 5.0 cm, we recalculate this feature for the SLAM cloud after performing the filtering for different values of tolerance limits, taking care to control the edge effects described in subparagraph 3.1.Table 3 shows the results.The second check involves calculating the C2C distance between the SLAM cloud, in unfiltered and filtered forms, and the TLS cloud used as a reference.For our application we impose a maximum search distance equal to 15 cm, an octree level equal to 7, and do not use any local modelling of the reference, the TLS cloud being sufficiently dense.To evaluate the effectiveness of filtering, we refer to the histograms of the distances obtained for the three tolerance limits (Fig. 4).We also report three slices (2.0 cm) of the relevant models for visual verification, especially at the intersecting edges of architectural elements, the places where incorrect setting of filtering parameters is most likely to produce data loss.

DISCUSSIONS AND CONCLUSIONS
In this paper we investigate the challenges of denoising techniques in unorganized point clouds.Although there are a few existing research on point cloud filtering, it is believed that filtering on the raw point cloud, being as a crucial step of point cloud processing pipeline, remains a challenging task.Our work has the specific objective of outlining a workflow to homogenize the roughness of homologous models obtained with different techniques and technologies.It appears clear, from the literature itself, that there is no univocal procedure and that the operations are related to the objectives and the same features of the analysed data.In our application we limit the investigation to an integrated survey that combines the TLS technique with a more recent mobile system based on the approach.We believe that in a filtering operation it is important to monitor the change in the geometric features of the model to deem the results of the procedure acceptable or not.It is equally true that the procedure itself can be influenced by these aspects, in turn related to the distinctive characteristics of the survey technique, the acquisition project and other For this reason, we attribute great importance to the preliminary study of the features and how these can influence the parameters that govern the procedure.Since many algorithms proposed by us are neighbourhood-based, we search for the most appropriate kernel size for the SLAM cloud, identifying the reference value of 5 cm which guarantees results not contaminated by the local properties of the cloud itself.
Regarding the filtering threshold, we study the tolerance limits of the roughness distribution for the TLS cloud.We believe this is an effective approach if the goal is to homogenize the noise levels of homologous models.In detail, once the confidence value is fixed, we construct intervals for different percentages of the population.At the same time, we monitor the geometric features to identify the most appropriate values.Although we cannot unequivocally identify the best solution, we can make some considerations.Looking at results of the three experiments conducted (Fig. 5), we can conclude that the tolerance limits for P = 0.999 generate the best compromise between cloud size (density) and roughness level.A population percentage P = 0.7, in fact, improves the noise level but excessively reduces the number of points, while P = 0.95 does not offer consistent improvements compared to P = 0.999.These are general considerations that are independent of the specific use of the model.In the case of source-based modelling, for example, it might be useful to start from a lighter reality-based model.In this case, once the surface density level is fixed, our study allows us to quickly check the surface roughness and evaluate whether the noise level is compatible with the processing.Unfortunately, the results achieved cannot be automatically extended to all relevant campaigns that use our same tools.The properties of the models, in fact, strongly depend on the individual acquisition campaign.This is particularly valid for mobile systems; in fact, in addition to the distance between the object and the operator, we must consider the speed of travel, the orientation of the instrument and other factors.Despite this, the proposed workflow is not particularly expensive in terms of time and computational resources and can be easily adapted to specific applications.Future developments will focus on extending the results achieved to the treatment of outliers.

Figure 3 .
Figure 3. Piecewise polynomial approximation for geometric features of the SLAM cloud.

Figure 2 .
Figure 2. Multi-scale evaluation for roughness and surface density of SLAM and TLS clouds.

Figure 4 .
Figure 4. Variations of C2C absolute distance for different tolerance limits.

Table 1 .
Feature evaluation versus kernel size.