Geometrically guided and confidence-based point cloud denoising

The generation of photogrammetric point clouds from satellite images is often based on image correlation techniques. Correlation errors can arise for a wide variety of reasons: transient objects, homogeneous areas, shadows, and surface discontinuities. Therefore, a simple 3D Gaussian distribution at the point cloud level is not an appropriate model. In this paper, we propose a new point cloud denoising method integrated into the Multiview Stereo Pipeline CARS, dedicated to satellite imagery. Building upon bilateral filtering principles, our approach introduces a novel utilization of color information, confidence estimation and geometric constraints alongside point positions and normals. While the use of point color increases the level of detail, the addition of geometric constraints and confidence awareness guides processing towards a realistic solution. We propose an ablation study and compare our solution against a previously established bilateral filter with LiDAR data as ground truth.


Introduction
As part of the CO3D mission (Lebègue et al., 2020), carried out in partnership with Airbus, CNES is developing the image processing chain including the open source Multi View Stereo (MVS) pipeline CARS (Youssefi et al., 2020), available under the Apache-2.0licence.By acquiring land areas within two years, providing 4 bands (Blue, Green, Red, Near Infra Red) at 50 cm, the objective is to produce a global Digital Surface Model (DSM) with 1 m relative altimetric error (CE90) at 1 m ground sampling distance (GSD) as target accuracy.The worldwide production of this 3D information will notably make a real contribution to the creation of digital twins (Brunet et al., 2022).
Satellite imagery provides global coverage, which unlocks the possibility to update the 3D model of any location on Earth within a rapid time frame.However, due to the smaller number of images or lower resolution than drone or aerial photography, a denoising step is necessary to extract relevant 3D information from satellite images.Denoising aims to smooth out flat elements in the observed scene (rooftops, roads) while retaining their sharp edges that are sometimes barely recognizable relative to the sensor resolution, such as the contour of small houses or the narrow gaps between them.
After a brief review of existing denoising methods, the motivation for using a MVS pipeline for our use case is presented in section 2. Section 3 describes in detail the proposed denoising method, its integration into an MVS and its differentiating features compared with the method from which it is derived.After a description of the data and reference used to qualify the method in section 4, section 5 shows how the point cloud is affected by the various modifications made to bilateral filtering.

Related work
Point cloud denoising is a topic widely studied in 3D reconstruction.Previous works have proposed two possibly consecutive steps, outlier removal and noise removal.The first step aims to remove out-of-distribution points.The second one, rather than removing the points, aims to move them closer to the surface.In case of noise removal, several methods, ranging from classical (Oztireli et al., 2009) to deep learning-based (Rakotosaona et al., 2020) have been designed over the past decades.The methods often rely on little a priori information because they can be used on point clouds generated from imagebased reconstruction techniques, with low-cost depth sensors like Microsoft Kinect or via a high-resolution 3D scanner such as LiDAR.
With the advent of the new NeRF and 3D Gaussian Splatting reconstruction methods, which represent a real breakthrough in 3D field (Mildenhall et al., 2020, Kerbl et al., 2023), we might think that denoising is less and less necessary.Although, these methods are beginning to be successfully adapted to satellite imagery (Derksen andIzzo, 2021, Marí et al., 2022), given the computation times of these new methods, the reconstruction pipelines from satellite imagery belonging to the MVS pipelines are still the most suitable for massive data production, winning the 2016 IARPA (Facciolo et al., 2017) and 2019 Data Fusion Contest (DFC) challenges (D'Angelo et al., 2019).Furthermore, many of these methods, including 3DGS, rely on an initial point cloud.Others (Marí et al., 2022) use a point cloud derived from photogrammetry as a prior to improve the quality of the 3D reconstruction results.In any case, these more costly methods may also benefit from denoised point clouds as inputs.
Several MVS pipelines have been developed and can be used to produce point clouds from satellite imagery (Krauß et al., 2013, de Franchis et al., 2014, Qin, 2016, Rupnik et al., 2017, Beyer et al., 2018).CARS1 stands out by being designed for maximize scalability robustness and performance, while going through the same steps as the state-of-the-art tools (Michel et al., 2020): rectification, stereo matching and triangulation.Since denoising is applied to the point cloud, it takes place after the triangulation stage (see Figure 1).SGM regularization (Hirschmuller, 2008) is often used, but imposes a trade-off between regularization and accepting disparity jumps: smooth surfaces can therefore remain noisy, hence the need for an ad-hoc denoising method developed for satellite imagery.Figure 2 shows an example of mesh before and after denoising.

Methodology
The proposed method is built upon bilateral filtering from (Digne and de Franchis, 2017).Julie Digne's bilateral filtering is an adaptation to point clouds of the filtering developed for meshes (Fleishman et al., 2003).This filtering only needs point positions and normals and an efficient implementation has been proposed that offers a good compromise between speed and quality.Figure 3 shows the contribution of neighbors in the bilateral filtering framework: Mathematically, each point p is moved to p ′ along the normal vector np estimated from q points from neighborhood N (p): where the displacement δp is expressed as follows: and where weights functions w are centered Gaussian functions: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-4/W12-2024 FOSS4G (Free and Open Source Software for Geospatial) Europe 2024 -Academic Track, 1-7 July 2024, Tartu, Estonia We have chosen to integrate new constraints into this filtering.
Our aim is to show how a priori knowledge can be used to guide denoising and, above all, to produce a denoised point cloud that is more consistent with the acquisition conditions or metrics obtained during correlation.This new method takes into account two important constraints.
The first is a geometric constraint.The input to the denoising step is a point cloud from photogrammetry resulting from matched points on the sensor images.Our pipeline CARS derives lines of sight from these matched points and the intersection of these lines given the target 3D positions.In our method, when we denoise this point cloud, the points are constrained to stay on their initial line of sight.This has two main advantages: firstly, the associated color will remain consistent with the new position.Secondly, points will not accumulate in certain spaces and create empty areas.This would be the case if the point was recolored following an approach like (Zhang et al., 2019).
The second constraint comes from the correlator PANDORA4 .The software is available under the Apache-2.0licence.The article (Sarrazin et al., 2021) describes a confidence metric, named ambiguity integral metric, to assess the quality of the produced disparity map.This measurement determines the level of confidence associated with pairs of homologous points.Each point is moved along the line of sight according to its confidence: the less confident the correlator, the more the point is moved while respecting the geometric constraint mentioned earlier.Apart from these two added major constraints, our method still uses the usual denoising parameters, such as the initial color and position of each considered point regarding its neighborhood.Normal smoothing is included to compensate correlation inaccuracy.Figure 4 shows the benefits of considering these additional constraints.

Experimental setup
An urban area of Nice (France) is selected as interest site for our study.We extract a 800x500 region from a couple of Plei- (c) New regression plane estimation (d) New normal distances as dotted lines and associated weights ades HR stereo images given as input for a CARS production.Default CARS configuration is applied, except for the DSM output resolution, set to 1m, to match CO3D product characteristics.For the stereo matching step, we use the census filter of (Zabih and Woodfill, 1994) along with a semi-global optimization from (Hirschmuller, 2008).
The denoising filter takes a point cloud as input.With the cropped stereo image couple as input, a first CARS production provides intermediate products, disparity map (epipolar coordinates) and point cloud dataset (earth-centered, earth-fixed coordinates).A final production generates the DSM of our region of interest (Universal Transverse Mercator zone 32N projection).For later quality evaluation of our denoised point cloud dataset, reference data is generated for each CARS output based on LiDAR HD® 5 .This freely distributed data contains 10 points per m 2 and includes a semantic label for each point, allowing for a class-specific quality assessment according to building, vegetation or ground.Using this external data a reference disparity map is then generated using a planimetric and altimetric readjustment between LiDAR and CARS DSM, followed by a ground truth disparity map using height-disparity ratio as in (Cournet et al., 2020).CARS production is reiterated with a breakpoint before triangulation to inject this new reference dis-5 LiDAR HD® website: geoservices.ign.fr/lidarhd The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-4/W12-2024 FOSS4G (Free and Open Source Software for Geospatial) Europe 2024 -Academic Track, 1-7 July 2024, Tartu, Estonia parity map.This new production enables the retrieval of a reference point cloud dataset and a final reference DSM, which integrates building, vegetation and ground classification as additional data.
Our bilateral filtering requires some user settings, similar to the one proposed by Julie Digne (Digne and de Franchis, 2017), which is parameterized by N (number of iterations), r (radius of the neighbourhood search) and Gaussian weights for euclidean distance and distance to the tangent plane.The following configuration results from a sensitivity study: • Number of considered neighborhoods for each point, K = 10 (we consider each point and its 99 neighbors); • Max iteration number = 10; • Standard deviation of Gaussian weighting factor for distance, sigma d = 2.0 (points located at more than 4.3m have a weighting factor under 10%); • Standard deviation of Gaussian weighting factor for color, sigma c = 60 (radiometric differences with point of interest that exceed 128.76 have a weighting factor under 10%); • Standard deviation of Gaussian weighting factor for ambiguity, sigma a = 0,2 ( ambiguity value above 0,5 has a weighting factor under 10%).
Two main steps integrate these parameters.Normal vectors are calculated at the beginning of the iterations for each point.Each vector is determined according to its neighborhood and selected attributes.Point repositioning is realized using these previously calculated vectors and point attributes in the neighborhood.To avoid the influence of normal vector quality on results due to selected parameters upon which they calculated, we fixed this normal vector calculation to an estimation using distance and color weight.To assess the contribution of each factor, we perform point repositioning for different configurations: • only distance weighting; • distance and color weighting; • distance and ambiguity weighting; • distance, color and ambiguity weighting; • all four configuration with projection onto the line of sight.
The output of the filter is evaluated quantitatively by means of global statistics on euclidean distances between denoised and reference point clouds and qualitatively using CloudCompare, a 3D point cloud processing software, and QGIS.
We consider our second constraint, the normalized ambiguity metric, as an indicator of correct point positioning in the CARS point cloud dataset.Using this criterion as a new filter parameter introduces the possibility of selecting points to relocate.This approach differs from other filter methods in literature where all points are affected.But this advantage of not moving already well positioned points broaches the first issue of determining how to classify points in both categories: to move or not to move.Using a strict threshold can create a lack of anchor points in some less dense or less qualitative datasets whereas a flexible threshold can be too dependent on point cloud ambiguity distribution.Therefore, we realized a correlation study between point ambiguity values and euclidean distances to their reference points for points considered as being part of buildings.
The results, visible in Figure 5, show that for decreasing ambiguity value, the gap between reference and initial point cloud dataset decreases.However, this correlation is only valid for points with ambiguity below 0,5 and contains a high number of outliers.Therefore, for points with ambiguity below 0,5, point repositioning will decrease with ambiguity whereas for points above the threshold, no weight will be assigned.We adapt this weighting factor so that areas of missing ≪ambiguity anchor points≫ that might be created by the strict threshold can be compensated by other filter factors such as distance and/or color.
Figure 5. Euclidean distance error distribution between initial CARS and reference point cloud according to their considered confidence (1 -ambiguity) range.

Results
Figure 6 shows the results varing the information added in the denoising step.Visually, color is the added information that seems to bring the most change.The other information added brings a few changes to the roof edges.during the first iteration, the number of points that are within 1 meter of the reference increases.However, it seems that the smoothing effect of distance with increasing iterations is attenuated when using ambiguity.Adding color seems to contain points within some limited range.CE90 is the lowest of all configurations and using ambiguity does not impact its value as for distance criterion.We see a small positive impact on median with ambiguity but color might have a predominant effect over ambiguity for our current filtering parameters.The minor im-pact of ambiguity might be explained by its spatial distribution.Low ambiguity regions are mostly located on top of buildings, compared to high ambiguity regions present on the ground.In these regions, ambiguity contribution tends to be poor because all points in neighborhood have similarly high ambiguity values.It is important to balance the contribution of the ambiguity and color constraints.If the colors are ignored, the resulting point cloud will have the same noise effects as the initial one.
Table 1 reveals the results obtained with and without projection of filtered points onto their respective line of sight.A behavior relatively similar to that observed without projection can be noted.Overall, projection onto the line of sight does not worsen the performance of bilateral filtering.A slight improvement in the best points is even noticeable.
As a reminder, points displaced by bilateral filtering undergo orthogonal projection onto their line of sight to ensure consistency with the acquisition geometry of these points (see Figure 4).In comparison, traditional bilateral filtering can move filtered points away from their observation sources.Obviously (by construction) this difference is zeroed when projected onto the lines of sight.This ensures superimposability with the other layers of the CARS product (color, confidence).
In table 1, it can also be noted that the quality of denoising with projection onto the line of sight deteriorates with the number of iterations.This behavior echoes that of the initial bilateral filtering (e.g., without projection), which seems to converge towards a very smooth surface.These results show that the denoising parameterization can still be improved to better preserve height discontinuities.

Conclusion
In this contribution, we have recalled the challenges of point cloud denoising while respecting acquisition conditions, within the framework of the CO3D mission.Building upon the work of (Sarrazin et al., 2021), we have proposed preliminary work towards integrating the so called "ambiguity", provided by the correlator, into bilateral filtering denoising.Once again, we have observed the correlation between strong ambiguity and positioning error.However, the results indicate that further studies are needed to adjust the contributions of distance (neighborhood), membership to the same object (following the hypothesis that there exists a strong correlation between color and an object and then between an object and a planar surface), and the ambiguity attributed to the 3D position of a point in the 3D point cloud.
We have also demonstrated that it is possible, without worsening of results, to impose a strong constraint on point displacement along the line of sight with a simple orthogonal projection.Point cloud denoising then resembles a post-processing step on the disparity map, preserving coherence between the point source (sensor) and the restitution of their 3D position.
Future work will incorporate a more precise notion of the permissible displacement of a point in space (Malinowski et al., 2024), as well as the replacement of color with object mapping, building upon the works presented in (Dumas et al., 2022).

Figure 1 .
Figure 1.Position of the denoising step in the full CARS pipeline

Figure 2 .
Figure 2. Example of mesh from points cloud before (a) and after (b) denoising.Generated from a pair of images (c) and (d) acquired by the Pleiades 1B satellite (Nice, France).
Figure 3. Neighbors weights in the denoising process (high opacity = large weight in (b) and (d)).Point p corresponds to the intersection between lines l1 and l2.Ground truth is represented by colored lines corresponding to the radiometry of the building to be reconstructed.
Figure 4. New information added in the denoising process (high opacity = large weight in (a), (b) and (d)).Changing the weights of neighbors modifies the regression plane estimation.Point p is reprojected onto the reference line of sight to ensure its original coherence (e).

Figure 6 .
Figure 6.Mesh filtered without using color (a) with color (b) with color and ambiguity (c) with color, ambiguity and projection onto the line of sight (d).

Table 1
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-4/W12-2024 FOSS4G (Free and Open Source Software for Geospatial) Europe 2024 -Academic Track, 1-7 July 2024, Tartu, Estonia summarizes the results obtained by integrating the ambiguity constraint.Statistical observation shows no obvious impact of ambiguity upon strict color or distance criterion.Indeed,

Table 1 .
Quantitative contribution of the proposed filtering method settings: taking color and confidence into account and forcing the point to stay on the reference line of sight.The error is calculated by comparison with the LiDAR HD®.The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-4/W12-2024 FOSS4G (Free and Open Source Software for Geospatial) Europe 2024 -Academic Track, 1-7 July 2024, Tartu, Estonia