GNSS ASSISTED PHOTOGRAMMETRIC RECONSTRUCTION FROM COMBINED 360° VIDEOS AND UAV IMAGES

: This paper introduces an integrated approach utilizing ground data consists of videos captured with a 360° (spherical) camera and aerial data acquired with a UAV equipped with a RTK GNSS module to reconstruct a portion of a small-town city center and/or a cultural heritage site. Previous research has demonstrated that image blocks oriented with RTK data on camera position can reach centimeter accuracies and can be efficiently used to reconstruct large areas and single monuments. However, some areas like porches, narrow passages and streets cannot be properly reconstructed from an aerial point of view. Conversely, ground-based 360° images offer detailed insights into the terrain and features that may be obscured from an aerial perspective. Integration of those two points of view can increase spatial resolution and coverage for 3D reconstruction. Indeed, the UAV captures large-scale features and topography, while ground-based 360° images focus on intricate details and ground-level characteristics. The possibility to exploit GNSS data acquired by UAV may also be used for GNSS-assisted image orientation with the aim of reducing or even avoiding, in specific situations, the need for GCPs. The paper explores practical applications of such data integration in the cultural heritage domain demonstrating the efficacy of the integrated approach in scenarios with complex architectures and inaccessible areas.


INTRODUCTION
Georeferenced 3D models of complex architectures and buildings are more and more requested for applications in various domains, such as engineering, architecture, and environmental sciences.Traditional photogrammetry and structure from motion workflows (Remondino et al. 2014) and other computer vision and computer graphic methods like Neural Radiance Fields -NeRF (Mildenhall et al., 2021) and 3D Gaussian Splatting (Kerbl et al. 2023) are gaining substantial interest as techniques for creating 3D point clouds, models and novel-view synthesis of 3D scenes.The advantage of image-based techniques with respect to other solutions, like LiDAR, is the possibility of utilizing cost-effective equipment and the possibility of rendering realistic scenes.In the domain of low-cost (~ 500 €) the utilization of 360° images in photogrammetric applications is on the rise and gaining popularity across various sectors, including narrowspace cultural heritage documentation (Valente et al. 2023), tunnel surveying for civil and architectural projects (Janiszewski et al., 2022), autonomous driving (Petrovai and Nedevschi, 2022), and more.The increasing appeal of 360° photogrammetry is due to its high efficiency during the data acquisition phase, enabling the capture of the complete surroundings area in a relatively brief time.Drone is another platform for image acquisition widely investigated and used in several projects for documentation and 3D reconstruction.These two sets of data can be viewed as complementary.While data from Unmanned Aerial Vehicles (UAVs) is valuable for reconstructing open areas, building roofs and surrounding topography, 360° images can provide ground-level data of enclosed spaces such as building porches and details that are challenging to capture from an aerial perspective.A further complementarity exists in data georeferencing.A good distribution of Ground Control Points (GCPs) has primary importance for a reliable georeferencing of image blocks composed of 360° images (see also, Teppati Losè et al., 2021;Barazzetti et al., 2022).Nonetheless, the placement of Ground Control Points (GCPs) can pose challenges, as certain areas may be hard to reach, and the identification of stable points may raise doubts.This process can also be time-consuming in the on-filed stage, particularly when GCPs are distributed over an extensive area.In addition, measuring GCPs can be a laborious task during the image processing stage, since each GCP has to be manually identified in multiple images, introducing the potential for errors by the operator.On the other hand, the use of real-time kinematic (RTK) and post-processed kinematic (PPK) technologies proved effective in "GNSS-assisted photogrammetry" applied to UAVs achieving centimetre-level precision in 3D model georeferencing without the need of GCPs.Even if such complementarities exist in theory few works in literature are addressing the effective combination of UAV data and 360° images for 3D reconstruction, as well as the possibility of using "GNSS-assisted photogrammetry" for georeferencing in a combined bundle block adjustment both UAV and 360° images acquired on the ground.This paper focuses on "GNSS-assisted" bundle adjustment of UAV and 360° ground images exploiting camera position derived from RTK/PPK UAV data.The paper presents some case studies addressing urban and cultural heritage applications.The paper is structured as follows: section 2 highlights related works addressing the topics discussed in this paper mainly focusing on 360° image and UAV photogrammetry; section 3 discusses the acquisition strategy for UAV and 360° images combined bundle adjustment; section 4 presents an overview of the data processing stage for 360° image stitching and selection, PPK data processing and software used for the bundle adjustment; section 4 presents the tests carried out to validate the proposed method and discusses the results; sections 6 draws conclusions for the presented work and presents possible improvements of the proposed methodology and future works.

RELATED WORKS
In the last decade, the market for consumer-grade cameras has experienced an exponential growth, resulting today in the availability of a broad range of new low-cost and easy-to-use sensors.In particular, two main sectors received significant attention and market success: 360° cameras and consumer-grade UAVs.The availability of such low-cost sensors stimulated the research on their utilization in several application fields.User request for 360° cameras significantly increased in the last decade for applications such as 360° photography and videos as well as for AR/VR projects.This led to consumer-grade cameras with high resolution both for photos (23MP) and videos (5.7k) and high frame rate acquisitions (30 fps at 5.7k resolution at an affordable price (~ 500 €).Availability of such low-cost consumer-grade cameras offers some interesting potentialities also for photogrammetric applications opening this was a branch of researches and works in literature.The topic of "spherical photogrammetry" was introduced in Fangi, 2007 with the definition of this concept and its mathematical conceptualization and model.From a practical point of view a spherical image is generally obtained by stitching a set of images acquired from the same nodal point.While earlier works solved this problem using only one camera rotating around the nodal point and stitching the images (Szeliski and Shum, 1997); the real shift towards modern consumer-grade 360° cameras was the possibility to acquire several images at the same time using different cameras and stitching them either on-the-fly or off-line.These systems consist of several cameras (minimum two) that in a synchronized way acquires images in different direction of the area all around the device.The images acquired in this way are then stitched together and creating a single 360° image.To render the obtained stitched image, it is generally used the equirectangular projection (also known as latitude/longitude projection).The availability of multiple cameras also allows the acquisition of streams of data that can be combined for the creation of 360° videos (this is not possible in the case of a unique image is used for the composition of the panorama).As previously mentioned, in the last few years several works in literature are addressing the topic of consumer-grade 360° cameras for photogrammetric applications mainly focusing on practical aspects of the photogrammetric workflow.Among other, topics such as the management of large amounts of data acquired and their effective orientation, the effects of parallax among the different cameras used to create the final equirectangular projection during the stitching phase, the optimal strategy for image acquisition, etc. represent challenges for the utilization of spherical cameras in practical applications.Works dealing with spherical cameras are dealing with surveying of narrow spaces (Valente et al. 2023) such as tunnels and caves, documenting architectural structures like the indoor of bell towers (Teppati Losè et al., 2021) or small-town city centres (Barazzetti et al., 2022).Such works demonstrated that spherical photogrammetry using consumer-grade cameras can be a useful tool for reconstruction of both indoor and outdoor areas.However, a good distribution of GCPs is fundamental to guarantee the metric accuracy of the results and completeness of the reconstruction can be quite low at higher buildings floors.In the last decade, Unmanned Aerial Vehicles (UAVs) have achieved significant market success, offering high-resolution camera systems as payloads for lightweight UAVs not only in the case of professional drones but also for consumer-grade and entry-level products.The feasibly of consumer-grade UAVs for documentation and 3D reconstruction of large sites is proved not only in literature works (Hill, 2019;Kerle et al., 2019;Ren et al., 2019;Elkhrachy 2021;Deliry and Avdan, 2021) but also in everyday professional activity.GNSS and inertial data for assisting bundle block adjustment were introduced since early 2000s on aerial platforms (Foralni and Pinto, 2002) for reducing the number of control points.Nowadays this concept is adopted to drone data too.In particular, the integration of UAV data with GNSS sensors and RTK/PPK (Real-Time Kinematic, Post-Processed Kinematic) processing has demonstrated accuracies comparable to those achieved with Ground Control Points (GCPs) (Tomaštík et al., 2019;Ekaso et al., 2020).While some studies (Calantropio et al., 2019) delve into the subject of 360° cameras and UAVs, their primary emphasis lies in integrating 360° cameras onto a UAV platform or they process UAV data and spherical data separately and then combine them into a unique model (Aires et al, 2022).In contrast, this paper prioritizes the integration of UAV and ground data obtained through a 360° camera.In a previous work (Previtali et al., 2023a) we already discussed the feasibility of a combined approach using ground data, consisting of videos acquired with a 360° camera, and aerial data, recorded using a lightweight consumer-grade UAV solution, to comprehensively reconstruct a part of a small-town city centre.However, in that work we operated the image orientation and georeferencing using a standard approach based on GCPs and Check Points (CPs).In this work we want to investigate the possibility of combining both UAV and ground 360° cameras into a unique framework without the need for GCPs by using RTK/PPK UAV data for "GNSS-assisted" combined bundle adjustment.

ACQUISISTION SYSTEM AND STRATEGY
The acquisition system considered in the tests presented in the paper is composed of a UAV platform and a 360° camera.The UAV adopted is the DJI Mavic 3 Enterprise.It is equipped with a Hasselblad camera (CMOS 4/3, 20 MPix, field of view 84°), and it can tag photo position either in RTK mode or the data of GPS, Galileo, and BeiDou can be logged to post-process data in PPK and geotag subsequently images.Tests presented in this paper were carried out using both RTK and PPK.Images acquired with the drone followed a typical grid structure with normal camera orientation as well as a set of oblique images (35° or less camera tilting).The aim of the set of oblique images is providing views of vertical surfaces such that the image matching between UAV data and ground 360° images is facilitated.The set of oblique images does not need to cover the entire surveyed area, areas that are not covered by oblique images can be reconstructed using ground data.Ground data are acquired with an Insta360 ONE X2.Ground data consists of 360° videos at 5.7k resolution.The main advantage of capturing 360° videos lies in its efficiency in terms of acquisition time.Indeed, users can simply walk within the area they want to survey with the camera mounted on a selfiestick.Starting from the acquired video specific frames at the resolution of 5.7k.Several strategies can be adopted to sample frames from the video, starting from a simple fixed user-defined framerate (e.g., two frame per second) up to more sophisticated criteria.However, we observed that a too coarse sampling rate can be detrimental for the photogrammetric workflow (mainly image alignment and reconstruction).For this reason, we suggest a sampling rate no more than a few seconds (according also to the walking speed).In the dataset used in this paper acquisition mode of video was set to automated, so that the camera adapts ISO and shutter speed automatically according to different light conditions that may vary during the acquisition (e.g., passing from a fully illuminated area to a porch).This choice may determine some low-quality frames (e.g.blurred and or under/over-exposed) in correspondence of the passageway areas.For this reason, a specific "frame quality check" is designed (see section 4) and in the acquisition phase speed is reduced near doors and any connections between spaces with different illumination conditions.In addition, the ground camera can be mounted on the top of a geodetic pole equipped with an Emlid Reach RS2 antenna (Figure 1).This configuration ensures that the camera and antenna acquire data that can be used to geotag 360° data as presented in Previtali et al., 2023.

DATA PROCESSING
Processing of the data is divided into three main parts: • 360° video stitching and frame extraction; • Drone data geotagging; • Combined drone data and ground 360° video image matching and "GNSS-assisted" bundle adjustment.
Firstly, the camera producer's dedicated software Insta360 Studio 2023 (Figure 2) is used to process and stitch the raw files (.insv).The aim of this task is to stick together the two video streams recorded by the two fisheye sensors constituting the camera.The stitching is performed using the Optical Flow option available in the software to minimize parallax effects.No image correction on colours is carried out.The stitched video is then exported in .MP4 format, employing the equirectangular projection.In contrast to a conventional camera, the 360° solution captures all surrounding surfaces without favouring any specific direction.Equirectangular frames are extracted from the recorded 5.7k video using a defined framerate: one frame per second is used in all the tests here reported.The selection criteria of the frames depend on different parameters such as walking speed during the acquisition, object distance, etc.During the extraction of the frames a "frame quality check" is carried out to avoid extraction of blurred and under/overexposed frames (Figure 3).If such frames are detected, they are discarded, and a new frame is extracted from the video.Laplacian filter is used to detect blurred images while the analysis of the image histogram allows for detecting under and over-exposed frames.If GNSS data are recorded for the ground acquisition they are used to geotag keyframes (see Previtali et al., 2023) extracted from the video, which will serve as a further constraint in the "GNSS-assisted photogrammetry" workflow.Otherwise, no specific location is associated with keyframes and their position is derived directly from the bundle adjustment with UAV images.
As a second phase drone data are geotagged.For the geotagging of the drone data if RTK is used no further processing is needed and images can be directly combined with 360° frames for the combined bundle adjustment.Conversely, in the case PPK is used UAV log data (Rinex) are processed using the Emlid Studio 1.7 software (Figure 4) to solve firstly for drone position and secondly by importing the MRK file generated by the Mavic drone to geotag UAV acquire images.The MRK file stores for each picture the time when the picture has been taken as well as a North, East and Down offset between the GNSS antenna phase centre (APC) to the camera CMOS sensor.As a final phase, the equirectangular frames and the drone images undergo conventional photogrammetric processes, which encompass image orientation and generating a dense point cloud.For the orientation of the data the software "Agisoft Metashape 2.0.2" was used in all the tests presented in this paper.
To validate the results of the image orientation, Check Points acquired using a GNSS antenna are employed.Furthermore, the accuracy and completeness of the point cloud obtained with combined reconstruction of UAV and 360° images are assessed by comparing the photogrammetric 3D model with a laser scanning survey of the building.

RESULTS
The presented approach was tested in three different case studies, primarily focusing on applications in the fields of small city centre urban mapping and cultural heritage documentation.
The first case study involved a feasibility test conducted in the University Campus, aimed at surveying a two-storey building from the XIX century.This test sought to identify the accuracy of the proposed method by comparing the results obtained using the proposed approach with those achieved through georeferencing based on Ground Control Points (GCPs).The final 3D model was then compared with a referenced dataset consisting of scans acquired by the Faro Focus X130 scanner.
The second case study ("San Michele") shows the application of the prosed method to survey a small hamlet and its main church.Results are compared with respect to a set of CPs measured in RTK mode confirming the suitability of the present approach.
Lastly, we are presenting the survey of a fortress in the Italian Alps ("Rocca di Vogogna").This test is presenting some of the typical results (i.e., point clouds, orthophotos, etc.) that can be obtained by the combination of UAV and 360° data acquired on the ground.

University Campus test
The first test was carried out in a courtyard of the Politecnico di Milano (Lecco Campus) around a XIX-century building.The building was surveyed with 137 UAV images and a total amount of 950 untagged (i.e., without associated position) 5.7k frames extracted from a 360° video recorded all around the building and 9 geotagged 360° images.The presence of a large tree limited the survey of a portion of the building with the drone.This part was acquired by 360° images only and exemplify the ground of the presented approach: UAV and ground 360° images can be complementary data to survey a building improving this way the completeness of the reconstructed 3D model.Orientation results of the tree tests are presented in Figure 5 and Table 1.Orientation results show, for all three tests, precisions on CPs and camera position comparable with the ones expected for RTK measurements.The reconstructed 3D model of the building is presented in Figure 6.As it can be observed the completeness of the north façade (Figure 6b-c, Figure 7a) reconstructed with UAV and ground 360° images is larger with respect to the one obtainable with UAV data only mainly in the areas under the eaves.Similar considerations for the "sport building shelter" (Figure 7b).While in the case of UAV data alone, this building is almost undefined, when using both UAV and 360° images, it is clearly reconstructed.To verify the metric accuracy, the photogrammetric point cloud obtained from UAV and 360° camera dense matching underwent a final assessment by comparing it with a set of reference scans obtained using a Faro Focus X130.The software CloudCompare (https://www.danielgm.net/cc/)was used to compare the two point clouds.This evaluation was conducted on common areas and the results in terms of unsigned cloud-to-cloud distance are depicted in Figure 8.

"San Michele" dataset
The second case study is the one of the "San Michele" hamlet in the municipality of Torre de'Busi (BG-Italy).It is composed of a complex consisting of the Church of San Michele Arcangelo, the Oratory of Santo Stefano, the Stations of the Cross chapels, the rectory and some houses built close to them (Figure 9).The church, dating back to the 12th century, stands isolated atop a picturesque rocky spur, is one of the oldest in the Valley of St. Martino.
The complex was surveyed with 224 UAV images and a total amount of 1316 untagged 5.7k 360° frames a 360° video.The raw data of the GNSS receivers of the drone were used to compute in PPK mode the camera positions.As a base station a virtual master station was computed in the areas using the network of permanent GNSS stations SPIN3 GNSS (https://www.spingnss.it/).A set of 10 CPs was measured in RTK using the positioning system provided by the SPIN3 GNSS with a baseline to the closest permanent station of approximately 15km.Orientation results (Figure 10a and Table 2) of this test confirm the results obtained for the "University Campus test" concerning accuracy on CPs and camera position.The slightly larger discrepancies can be connected on the worst acquisition condition and larger baseline for RTK.The Reconstructed 3D model of the complex is presented in Figure 10b.Also, in this case the possibility to complement UAV and ground 360 images was important to obtain a more complete reconstruction of the entire scene.The cliff over which the church is constructed can be clearly surveyed by the drone platform while the porch surrounding the church on the south side could be efficiently surveyed with ground 360° data.A specific issue observed in this test regards the slightly different colour settings of the two cameras.Indeed, as previously anticipated, we selected an automated profile for ISO and shutter speed settings for video acquisition with the Insta360 ONE X2.This resulted in a different colour profile of the images acquired by the two sensors.This issue can be quite important if products such as othophotos or textured models are expected as products for the survey.Further research on this direction will be carried out in future works to provide a method useful for color calibration of the two sensors aimed at minimizing color differences.

"Rocca di Vogogna" dataset
The proposed approach was then tested for cultural heritage monument such as the Rocca di Vogogna located in Vogogna (VB-Italy).This monument is constituted by the remains of a fortress built on the top of a cliff for defensive purposes of the Ossola Valley.

CONCLUSIONS AND FUTURE WORKS
The paper has presented an integrated approach utilizing both ground (360° videos) and aerial (UAV photos) data to comprehensively reconstruct a small-town city centre and/or a cultural heritage area.The two data typologies are complementary and can increase spatial resolution and coverage of the final 3D point cloud and orthophotos: UAV captures extensive areas and topography, whereas ground-based 360° images concentrate on fine details and characteristics at ground level or areas that cannot be sensed from an aerial perspective (e.g., porches, areas covered by vegetation, indoor spaces, etc.).
In the presented work the utilization of GNSS data acquired by UAV was used for GNSS-assisted image orientation with the aim of reducing of even avoiding, in specific situations, the need for GCPs.Camera positions are computed either using RTK and PPK mode.The accuracies obtained in the presented tests are showing an accuracy similar to one achievable with a standard approach based on GCPs (i.e., in the order of few centimetres).The possibility to avoid or reduce the number of GCPs and CPs speeds up the acquisition phase and reduces the time for manual measurement of GCPs on images during the processing phase.While valid matches exist between UAV and ground 360° images, they are generally located at the boundaries (bottom and top) of the equirectangular projection for the spherical images.These are generally the areas with larger distortion due to the equirectangular projection and may determine larger uncertainties in the matching phase.The impact of this factor on the definition of valid links will be investigated in future studies.A second aspect that needs further investigation is the chromatic calibration between the two different sensors aimed at minimizing color differences.The paper explored applications in cultural heritage domain demonstrating the efficacy of the proposed approach in scenarios with complex architectures and inaccessible areas.In future research we will explore other possible applications like forestry documentation and risk assessment.

Figure 1 .
Figure 1.The Insta360 ONE X2 mounted on the top of a geodetic pole with equipped with an Emlid Reach RS2 antenna to collect geotagged 360° images.

Figure 3 .
Figure 3. Detail of a frame detected as "blurry" (top) substituted with a frame extracted 0.7 s later (bottom).

Figure 4 .
Figure 4. Emlid Studio 1.7, the software used for drone data geotagging with PPK.

Figure 5 .
Figure 5. Orientation results of image orientation for the three presented tests: Test A (a), Test B (b) and Test C (C). Blue rectangles represent UAV images while blue sphere represent 360° images.To validate the data 11 Check Points in the area were measured in RTK mode (expected precision 1-2 cm, the baseline to the closest permanent station is approximately 100 m) all around the building.

Figure 6 .
Reconstructed point cloud: point cloud of the building using UAV and 360° images (a), reconstructed north façade using UAV only (b) and UAV + 360° ground data (c).

a b Figure 7 .
Comparison of the reconstructions with UAV only (top) and UAV + ground 360° images (bottom): detail of the north facade (a), detail of the sports shelter under the tree (c).

A
false-colour representation was employed to highlight discrepancies on the reference point cloud (from laser scanning) concerning the photogrammetric one.The comparison reveals an average discrepancy of 2.1 cm, and the overall accuracies align well with the results obtained from Ground Control Points (GCPs) and Check Points (CPs).The distribution of discrepancies evenly distributed on the model reveals no systematic effects or bias in the photogrammetric dense point cloud.

Figure 8 .
Figure 8.Comparison between laser scanning and 360° images: results are colorised according to discrepancies on the point cloud measured in metres (top) and in a diagram of discrepancies (bottom) vertical line is set at 2.0 cm and shows a 55 % of data under it.

Figure 9 .
Figure 9.The historical complex of San Michele.

Figure 10 .
Results of the "San Michele" dataset: The historical complex of San Michele: UAV + 360° images (a) where blue rectangles represent UAV image while red spheres represent 360°images and reconstructed point cloud (b).

Figure 11 .c
Figure 11.Results of image orientation for "Rocca di Vogogna" case study UAV + 360° images: side view (left) and top view highlighting 360°image (right).Blue rectangles represent UAV image while red sphere represent 360°images.This dataset is composed of 762 drone images and 2251 ground 360° frames.Orientation results are presented in Figure11while obtained point cloud and orthophoto are presented Figure12.While UAV data were specifically useful for the survey of the south-east side of the fortress that is vertically constructed over a cliff the ground data allowed a detailed documentation of the dry-stone walls constituting the north-west side.

Table 2 .
The entire survey was carried out in approximately 2 hours.Two tests were carried out for this case study: • Only UAV geotagged data (Test A); • UAV geotagged + 360° untagged data (Test B).Orientation results on CPs and camera position are summarized in

Table 2 .
Discrepancies on CPs and camera positions for the two tests ("San Michele" dataset).