Separate and Integrated Data Processing for the 3D Reconstruction of a Complex Architecture

: I n the last few years, data fusion has been an active research topic for the expected advantages of exploiting and combining different but complementary techniques for 3D documentation. The data fusion process consists of merging data coming from different sensors and platforms, intrinsically different, to produce complete, coherent, and precise 3D reconstructions. Although extensive research has been dedicated to this task, we still have many gaps in the integration process, and the quality of the results is hardly sufficient in several cases. This is especially evident when the integration occurs in a later stage, e.g., merging the results of separate data processing. New opportunities are emerging, with the possibility offered by some proprietary tools to jointly process heterogeneous data, particularly image and range-based data. The article investigates the benefits of data integration at different processing levels: raw, middle, and high levels. The experiments are targeted to explore, in particular, the results of the integration on large and complex architectures.


INTRODUCTION
In the last years, the use of static or mobile platforms with active or passive sensors and the integration of multi-modal information have brought undisputed advantages in particular for the 3D reconstruction of complex heritage monuments and scenes (Remondino et al., 2013;Remondino et al., 2018;Lin et al., 2019;Adamopoulos and Rinaudo, 2021;Treccani et al., 2024).The use of both active and passive sensors is already a common practice for the restoration of complex architectures where geometrical accuracy for structural analysis and high-resolution orthoimages are of utmost importance.However, a real integration (or fusion) of sensors and data that could allow the achievement of more complete and detailed 3D models both in terms of geometry and texture, exploiting the intrinsic benefits of each sensor and technique, remains difficult to achieve.In fact, despite the promising results of these imageries fusion, several issues generally arise from the joint processing of such sets of data.Big perspective changes, different scales and illumination conditions or varying point density can, in fact, deeply affect fusion results and, consequently, the 3D products.By using photo-realistic and accurate 3D models, monitoring operations can be performed, or the actual conservation state can be studied and preserved for future generations.

Paper aims
The work's aim is two-fold: • to review data/sensor fusion methods for the 3D documentation of complex architectures; • to investigate how beneficial (or not) is to integrate TLS and photogrammetry for the 3D digitization of complex architectures.Contrary to other approaches, we try to merge data coming from different sensors (e.g., laser scanner and camera) at middle-and raw-level and not only at the end of the separate data processing (high-level) (Figure 1).In some activities, this middle-and rawlevel fusions are called hybrid adjustment (Yadav et al., 2023).
Figure 1: The data fusion concept presented in this paper.Data can be combined and processed at raw-level (i.e.immediately after their acquisition), along the processing (middle-level) or once the separate processing are concluded (high-level).

STATE OF THE ART
Since many years, aerial, UAV and terrestrial surveys (imaging or laser scanning) are being combined in order to exploit the intrinsic advantages of each platform and technique as well as to overcome possible bottlenecks of a single method.Fusion techniques (Khaleghi et al., 2013;Lahat et al., 2015;Ramos and Remondino, 2015;Jusoh and Almajali, 2020) refer to the combined used of platforms and sensors or the joint processing of different data or the combination of 3D results coming from different processes.Nowadays, sensors / data fusion is a common practice in the 3D documentation of cultural heritage (El-Hakim et al., 2007;El-Hakim et al., 2008;Guidi et al., 2009;Remondino et al., 2009a,b;Chane et al., 2013;Puete et al., 2018;Patrucco et al., 2020;Sutherland et al., 2023;Grifoni et al., 2024), but also for urban mapping (Toschi et al., 2018;Megahed et al., 2021), or environmental analyses (Tonolli et al., 2011;Zieher et al., 2018).Farella et al., (2020) presented a fusion solution for UAV and terrestrial images using geometric metrics and statistical analyses of quality feature distributions to identify suitable filtering thresholds.Toschi et al. (2021) proposed an advanced method to merge LiDAR and photogrammetric point clouds in urban aerial mapping using sensor-specific quality features.Pamart et al. (2023) suggested the Multimodal Enhancement Fusion Index (MEFI) for a better data fusion and 3D digitization process.Raw-and middle-level data fusion for 3D reconstruction purposes were mainly proposed in case of aerial LiDAR point clouds and images, exploiting the available sensor trajectory (Glira et al., 2019;Jonassen et al., 2023).

CASE STUDY AND DATA ACQUISITION
Data used for evaluating fusion methodologies represent Church of Santa Maria di Loreto in Rome (Italy), situated near the Trajan's Column, an exemplary piece of High Renaissance architecture (Figure 2).Designed by Antonio da Sangallo the Younger in the early 16th century, it was completed by Jacopo del Duca in 1582.This church is renowned for its elegant circular plan, which is a rarity in church designs.The structure is crowned with a majestic dome, reflecting the influence of Bramante and Michelangelo, who were pioneers in the use of such domes in Rome.The dome, with its ribbed structure, is a defining feature and is supported by a drum that is articulated by pilasters and windows, lending it a rhythmical verticality.The façade of Santa Maria di Loreto is characterized by its classical order, with Corinthian pilasters and entablatures that imbue it with a sense of harmony and proportion.This façade is divided into two orders: the lower order is more robust and grounded, while the upper order is more delicate, culminating in a pediment that adds to its classical grandeur.The interior of the church is equally impressive, with a richly decorated coffered ceiling, golden mosaics, glossy marble pilasters and a sumptuous high altar.The use of marble, stucco, and gilding inside creates a vibrant and opulent atmosphere.The integration of art, with frescoes and sculptures by prominent artists, adds layers of visual and emotional depth to the architectural experience.Based on the end-user requirements, the 3D surveying of the church was performed using both terrestrial laser scanning (TLS) and photogrammetry (Table 1).This allows to take full advantage of the unique strengths of each technology, ensuring a comprehensive and detailed documentation of the structure.TLS was required since it is considered a highly reliable method for detailed measurements, and it creates an accurate base model for structural analysis and conservation.The TLS survey was carried out using a Leica P40 for the exterior part (point position accuracy of 3mm at 50m) and a Faro Focus Premium 70 for both the interior and the exterior (point position accuracy of 3.5mm at 25m) of the church.
On the other hand, photogrammetry was necessary for capturing the textures and colours of surfaces, offering high-resolution imagery that can be used for detailed orthoimages, visual inspections and mapping conservation and restoration activities on decorated surfaces such as frescoes, stucco and mosaics.For this reason, the TLS campaign didn't include the capturing of photos for colorizing the TLS point cloud.The photogrammetric survey of the interior and exterior of the monument was planned at an average GSD of 1cm and 2cm, respectively, and performed with two cameras: a Sony Alpha 7 IV (33 Mpixel), full-frame Exmor R CMOS sensor coupled to a Sony FE 24-105mm f/4 G OSS lens, and a DJI Mini 3 Pro UAV mounting a 1/1.3-inchCMOS sensor (48 Mpixel) coupled with a wide-angle lens (82.1°FOV) f/1.7.In order to homogenize the colour of the images, Adobe Lightroom was used to adjust temperature, hue, exposure, highlights and shadows.For the purpose of this paper, data processing and analyses focus solely on the church's exterior (Figure 3) for two reasons: • the amount of collected data (Table 1) is too large for being processed in reasonable time while testing the fusion approaches; • the exterior of the monument allows anyway to review fusion methods and investigate their benefits for a complex architecture that presents elements like cornices and entablatures (which cast significant shadows), the dome (that TLS cannot effectively capture due to its curvature and elevation) as well as various surfaces, including marble, masonry and plaster.

DATA PROCESSING
The separate processing and fusion at a high-level (Figure 1) of the acquired data showed major downsides, such as deviations between the separate 3D results that could not be solved (Figure 4).This is probably due to a scaling issue, given the limited number of available ground control points.Hence, combining data from both acquisition methods at a lower level (raw-or middle-level) could lead to a much higher, detailed and accurate 3D representation of the monument that is both geometrically precise and visually rich, offering major advantages to architects, conservators and restorers.Therefore, data were processed at different levels of integration (Figure 1) to evaluate the benefit of data fusion in terms of reconstruction accuracy and noise.Three different scenarios (Figure 5) were created and two different photogrammetric software (Agisoft Metashape 1 -vers.2.1.1 and Capturing Reality Reality Capture2 vers.1.3) utilized, as afterwards reported.Results analyses are reported in Section 5.

Scenario A -separate data processing with fusion at a high level (HL)
The single TLS point clouds were registered in Leica Cyclone with an ICP (Iterative Closest Point) algorithm in order to create 1 https://www.agisoft.com/a unique point cloud of the exteriors (Figure 3c).For the imagebased processing, the 4654 images, including both terrestrial and UAV, were processed separately from the TLS for producing coloured dense point clouds in both Agisoft Metashape (MS) (Figure 3b) and Reality Capture (RC).In both processing, GNSS metadata were ignored and image orientation settings (image resolution, number of keypoints) were kept consistent among the two solutions.However, some differences are worth reporting: • Image orientation results in RC produced six groups ("components") of oriented photos: one counts 4622 images, with the remaining 32 images distributed among the other 5 components.
• MS could orient all 4654 images, performing a selfcalibration and retaining tie points with a multiplicity higher than 2.

Scenario B -integrated data processing with fusion at middlelevel (ML)
Both Metashape and Reality Capture allow the simultaneous processing of TLS and image data within an adjustment procedure.Co-registered laser scans and unordered images serve as input to the reconstruction pipelines in this scenario.
In both RC and MS the TLS point cloud was imported in .e57format as a structured cloud (i.e., containing 3D points and 360 degrees panoramic image for each scan location) and kept fixed as reference on which the images must be oriented.It was observed that: • RC uses a cube-like format for the TLS panoramic images (Figure 5a).The 4654 photogrammetric images could be oriented with the 56 scans only after manually adding 34 markers (tie points).• MS uses an equirectangular projection for the TLS data (Figure 5b).Despite some memory issues given by the large quantity of TLS data and images, it could align the photogrammetric images and TLS data without the need of markers.

Scenario C -integrated data processing with fusion at rawlevel (RL)
Metashape and Reality Capture enable the simultaneous processing of TLS and image data without any previous coregistration of the single TLS data.For this fusion level, unregistered laser scans and unordered images are fed to the reconstruction pipelines.Each scan (3D points and panoramic image) was imported in .e57format in both Metashape and Reality Capture to perform the registration: • RC could orient all 4654 photogrammetric images but only 36 / 56 scans, despite the use of multiple markers (up to 53); • MS could orient all 4654 photogrammetric images but only 37 / 56 scans (7 not aligned and 12 misaligned).Figure 7 shows how the scans registered (or not) by RC and MS are differently distributed in the surveying area.While RC failed in aligning most of the scans performed on the two terraces, MS struggled with those located on one entire façade.In all 3 scenarios, after the data registration, a dense image matching process is applied to generate the final point cloud of the surveyed monument.The resulting dense point clouds (Table 3) are evaluated in terms of noise over flat surfaces, visual clarity and completeness.

RESULTS AND ANALYSES
To quantitatively evaluate the three fusion processes, four areas featuring flat surfaces are considered, namely: • P1: a vertical masonry wall of ca 2x4 m; • P2: a vertical plaster wall of ca 2.5x1.5m; • P3: a vertical masonry wall of ca 2x1.5m; • P4: a horizontal pavement of ca 4x7m.
In each area a patch containing a sufficient number of points is extracted.Then, using CloudCompare 3 , a plane is fitted to each patch and a Cloud to Plane (C2P) distance is computed, limiting to ± 3 sigma the upper/lower limit of the ranges.Table 4 shows the standard deviation of C2P distance for each patch.Figure 8 reports some visual comparisons of the noise in the selected patches for the different fusion levels.Notice that coregistration errors may affect the extraction of patches producing slightly different selection of points.For this reason, the patches shown in Figure 8

High-level (HL) fusion
As expected, results for the high-level (HL) fusion approach feature the highest amount of noise, directly inheriting from the separate TLS and photogrammetric point clouds which are simply merged.We can observe how, for all patches, merging the photogrammetry-only and the registered TLS point clouds is unfavourable to the overall 3D quality.As shown in Table 4, a general increase over the standard deviations of the TLS-or photogrammetry-only patches is seen when fusing the point clouds.Since no filtering or selection heuristics are applied to the inputs at this fusion level, a 3D point may be represented twice in the final point cloud.For this reason, the high-level fusion produces point clouds with the highest total number of points (Table 3).TLS-like linear patterns are clearly visible in HL patches (P1, P2 and P3), while photogrammetry-like patterns are noticeable when applying shader graphics (Eye-Dome lighting, EDL), as shown in Figure 8 for patch P4.In terms of completeness, HL point clouds generally outperform the other fusion methods as they can make use of all acquired laser scans and photogrammetric clouds.This is shown in Figure 8 and Figure 9 where details such as thin railing or weed are well represented.Nevertheless, a higher completeness does not translate to a better quality of the 3D data.

Middle-and raw level (ML, RL) fusion
The successful integration of the input sources in both middlelevel (ML) and raw-level (RL) fusions produces less noise compared to high-level (HL) fusion.Considering the analyses patches (Table 4 and Figure 8), ML fusion tends to outperform the single TLS and photogrammetric data, as well as the other fusion approaches, in terms of standard deviations and visual clarity.Real, geometrical patterns in highly textured patches tend to be visible and become evident when applying shader graphics, as shown in the P4 row of Figure 8.Both ML and RL display a similar behaviour to TLS, attenuating most of the noise carried by photogrammetric-only point clouds.RL is penalized by the lack of a priori co-registration of TLS scans (and indeed not all TLS scans were co-registered).This is visible for P4 in Figure 8 where a discontinuity appears in the pavement of the RL cloud because of misregistration in the vertical coordinate.The performance of the RL integration is highly dependent on the success rate in aligning all available laser scans with the photogrammetric images.In this case study, ML manages to utilize the complete set of laser scans as provided from initial coregistration outside the tested software, while RL suffers from loss of completeness most probably due to unregistered laser scans, as shown schematically in Figure 7. Figure 9 shows an example: the portal on the east side -for which only part of the scans was successfully aligned in RL fusion.In the RL cloud, the lower part of the door and the railing in front show a larger number of missing points compared to the ML cloud, which utilized all the scans for this side.Indeed, RL in this section is visually similar to the photogrammetric point cloud.The same reason can explain differences between ML and RL fusions for small details such as railings on top of the dome, as shown in Figure 10.Here, photogrammetry is not able to fill gaps left by unused laser scans in RL, while ML provides a more complete reconstruction.

CONCLUSIONS
The work presented an investigation on TLS and image data fusion for the 3D documentation of complex architectures.The two surveying techniques have advantages and disadvantages and the literature has already demonstrated that their fusion can only lead to benefits and improvements in the results.But, in particular for terrestrial application, a complete and efficient fusion is not yet performed if not at high-level (Figure 1).Rawand middle-level fusion experiments were performed using commercial software.Visual and numeric results highlight the relevant potential of an advanced fusion approaches with respect to the conventional a-posteriori combination (high-level).But despite the evident advantages in terms of better geometry and less noise in the results, some bottlenecks of the current processing approaches were highlighted, especially when referring to the raw-level case.Further analyses are needed to investigate the raw-and middlelevel fusion of laser scanning and photogrammetric data, in particular when the trajectory is not available like in aerial acquisitions.Learning-based approaches could improve the registration of TLS and photogrammetric data whereas the The church of Santa Maria di Loreto in Rome used to evaluate the fusion methodology.Real view (a), point cloud (b).
Data processing results: retrieved camera poses and sparse point cloud for some 4654 image (a) and colorized dense point cloud (b) from Agisoft Metashape.The co-registered TLS point cloud (56 scans) shown with its intensity values (c).

Figure 4 :
Figure 4: Deviations and misalignments between the coloured photogrammetric cloud and slices from TLS data.

Figure 5 :
Figure 5: The three scenarios to evaluate data fusion.
Examples of the panoramic images -cubic (a) and equirectangular (b) -used to fuse scanning data and images.

Figure 7 :
Figure 7: Positions of correctly registered (green) and notregistered (red) TLS scans.from Metashape (left) and Reality Capture (right).Metashape -MS Reality Capture -RC Scenario A 549 mil 557 mil Scenario B 169 mil 410 mil Scenario C 158 mil 323 mil Table 3: Number of points (vertices) in the dense point clouds generated in the three different fusion methods.

Figure 8 :Figure 9 :Figure 10 :
Figure 8: Noise analyses on patches in the different fusion levels.Distances from fitted planes in metres.Values limited to ± 3 std.

Table 1 :
Data acquired for the 3D documentation of the heritage monument and the evaluation of the fusion approach.

Table 4 :
might have small incongruences but retain statistical significance.Standard deviations (mm) of C2P distances for extracted patches at different fusion levels.