DIRECT GEO RE FERENCING APPROACHES FOR CLOSE-RANGE AND UAV PHOTOGRAMMETRY IN THE BUILT HERITAGE DOMAIN.

: Direct georeferencing uses onboard sensors to measure the position and orientation of the camera during image acquisition for photogrammetric applications. This approach aims to eliminate the use of traditional G round Co ntrol P oints (GCPs) in the photogrammetric process in order to reduce the costs and the time of the survey operations. The direct georeferencing technique involves integrating measurements from inertial measurement units (IMUs) and Global Navigation Satellite Systems (GNSS) data in order to evaluate the position and attitude of the camera with high accuracy (a few centimeters). In the Built Heritage survey domain, this approach is mainly followed by the employment of UAVs (Uncrewed aerial systems) platforms that are nowadays equipped with highly accurate systems able to evaluate the external parameters for the photogrammetric process. For terrestrial applications, few already achieved tests were performed; moreover, the sensors today available for extracting information from close-range acquisition systems are limited and sometimes under development. To evaluate the possibility offered by these new direct georeferencing tools, a test on the 3D ImageVector (REDcatch GmbH) has been performed. The results and the strategies followed will be presented and analyzed in order to understand better the accuracy and the potentiality of this new promising approach.


INTRODUCTION
Nowadays, photogrammetry can be considered a consolidated technique for several applications, specifically in the Built Heritage documentation domain (Achille et al., 2018;Calantropio et al., 2018;Remondino & Stylianidis, 2016). The majority of the entire photogrammetric pipeline has reached high levels of automation thanks to the research efforts of the Geomatics and Computer Vision communities in the last decade. Nevertheless, among the different phases of the process, the solution of the Exterior Orientation (E.O.) is still the one that requests significant intervention from the operator and a consistent amount of time. This phase (also called aerial triangulation or bundle block adjustment) is generally solved using a series of Ground Control Points (GCPs) and Check Points (CPs) and affects both the acquisition and the processing phases. Indeed, during the fieldwork operation, GCPs and CPs are measured with traditional topographic techniques such as Total Station (TS) or Global Navigation Satellite System (GNSS) receivers. Generally speaking, GCPs and CPs can be natural features of the asset to be surveyed or artificial coded targets (in this case, some time must also be dedicated to their positioning on the imaged object). Moreover, the position of all GCPs and CPs must be recorded/measured and documented (sketching), with additional effort for the operators. Furthermore, in the photogrammetric workflow, a large amount of time needs to be dedicated to the identification of GCPs and CPs on the different images acquired for photogrammetric processing. This phase allows to connect the image coordinates with the object coordinates; some solution for automatically recognizing coded target esxists (e.g., Shortis & Seager, 2014), but they are not always feasible in the Built * Corresponding author Heritage documentation domain. Due to environmental constraints, it is not always possible to position and measure the control points: security of the involved operators (e.g., in an emergency context or in dangerous areas), inaccessibility of the surveyed area, or simply limited resources in terms of time, operators, funds are all elements that can affect this phase of the work (in the heritage documentation domain sometimes it is not even possible to place the targets on the asset to be surveyed). Generally speaking, georeferencing of the photogrammetric block requires at least 4 GCPs (Bolkas, 2019). From an operative point of view, using more than 4 GCPs to add redundancy to the dataset and better estimate the Interior Orientation Parameters (I.O.P.) of the camera used is generally advised. Finally, it needs to be reported that another critical element to consider is the number and spatial distribution of the ground control points and due to the environmental constraints reported before, it is not always possible to reach an optimal network of GCPs.

Direct georeferencing approaches
To speed up the solution of the E.O. phase the direct georeferencing approach using different devices has been developed over the years. This approach has been particularly tested, developed, and enhanced in Uncrewed Aerial Vehicles (UAV) photogrammetry, especially in the last few years, and thanks to the availability of different new aerial platforms with onboard high accuracy GNSS antennae (today high accuracy double frequency and multi-constellation receiver are available at a fair price) coupled with other sensors such as Inertial Measurement Units (IMU). Combining these two types of sensors makes it possible to estimate the position and orientation The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-M-2-2023 29th CIPA Symposium "Documenting, Understanding, Preserving Cultural Heritage: Humanities and Digital Technologies for Shaping the Future", 25-30 June 2023, Florence, Italy of the platform at the time of the acquisition, thus enabling a direct georeferencing approach. Several works are available in the scientific literature on the UAV photogrammetric application scenario (Ekaso et al., 2020;López et al., 2022;Peppa et al., 2019;Stöcker et al., 2017) and it is confirmed that these methods can reach a few centimeters accuracy. In this domain, different strategies can be exploited to georeference the data, such as N-RTK (Network Real Time Kinematic), RTK (Real Time Kinematic), or PPK (Post Processing Kinematic). In the PPK approach, the platform and camera positions are estimated after the acquisition phase, thanks to the data acquired and recorded by the GNSS receiver during the acquisition. In this method, it is possible to download and use precise ephemeris data and, in general terms to achieve more accurate results with respect to the two other approaches. In the RTK method, the coordinates derived from the measurements recorded by the onboard GNSS receiver are corrected using the information sent by a base station. In this approach, camera position and orientation are estimated in realtime, and the GNSS base station is generally a receiver set-up on the field. If a virtual station created by a network of CORSs (Continuously Operating Reference Stations) is used, the corrections are sent via a GNSS Networked Transport of RTCM via Internet Protocol (NTRIP). This approach is then called N-RTK and it requires a stable radio connection as well as internet connectivity. All these three approaches allow for achieving, theoretically, a few centimeters of accuracy in the solution of the E.O. phase and several studies are available in the literature (Benassi et al., 2017;Bolkas, 2019;Ekaso et al., 2020;Gabrlik, 2015;Rabah et al., 2018;Stöcker et al., 2017;Štroner et al., 2020;Tomaštík et al., 2019;Zhang et al., 2019). Despite the rich literature available for the application of these approaches in the UAV domain, there are still some open issues to be tackled, especially when transposing the same methods to terrestrial photogrammetric applications.

MATERIALS AND METHODS
In the sphere of close-range photogrammetry, few experiences address and solve the direct georeferencing problem, mainly due to the less availability of Digital Single-Lens Reflex (DSLR) or Mirrorless cameras equipped with this kind of sensor for positioning.

The tested system
The presented paper deal with different tests conducted using the 3D ImageVector (https://www.redcatch.at/3dimagevector/) by RedCatch on a Built Heritage asset. The 3D image system is composed of a GNSS multiband and multi-constellation receiver, an IMU, and a data logger; it can be attached to different camera models and connected via the hot shoe mount. The producer claims an accuracy of 2 cm + 1 ppm of the positioning using the RTK/PPK (Real Time Kinematik/ Post Processing Kinematik) approach and 0.2° for rotation (roll, pitch, and yaw) thanks to the embedded IMU ( Figure 1). The 3D ImageVector records the GNSS position data (RINEX Data for PPK, NMEA lines for RTK) and determines solid angles via the IMU. Trigger information with a resolution of 30 ns (nanoseconds) is recorded for each image. As mentioned before, the system could be used in different configurations, in the achieved tests, the N-RTK approach was followed. In this case, it is necessary to connect the data logger to a smartphone hotspot that allows the system to receive the real-time GNSS correction from the selected GNSS network that for the achieved tests was the HxGN SmartNet by Hexagon. Once the data-logger LED related to the RTK-state is fix is possible to start with the image acquisition following the planned schema. The 3D ImageVector records the GNSS position data (RINEX Data for PPK, NMEA lines for RTK) and determines solid angles via the IMU. Trigger information with a resolution of 30 ns (nanoseconds) is recorded for each image. As mentioned before, the system could be used in different configurations, in the achieved tests, the N-RTK approach was followed. In this case, it is necessary to connect the data logger to a smartphone hotspot that allows the system to receive the real-time GNSS correction from the selected GNSS network that for the achieved tests was the HxGN SmartNet by Hexagon. Once the data-logger LED related to the RTK-state is fix is possible to start with the image acquisition following the planned schema. During the data capturing all the acquired information will be stored in the system. Some suggestions for a correct acquisition are the following: use smooth movements; do not rotate the camera (tilt and roll) more than 40° from horizontal. In doing so, the GNSS signal will be lost, and you will have to wait an additional time to re-initialize. While walking, keep the camera at an approximately constant height, and check the data logger to ensure that all the data are correctly acquired.

Case Studies and data acquisition
For this preliminary experience, the system has been tested with a mirrorless camera on two Built Heritage assets selected as case studies.
The aim was to understand if it is really able to reach the claimed accuracies; if confirmed, those accuracies are in line with the precision of the main architectural representational scales and can boost the overall acquisition and processing phase of terrestrial photogrammetric data.
The tested camera is a Sony α7R III equipped with a 24 mm lens. Acquisitions with the 3D ImageVector have been performed following consolidated close-range photogrammetric schemas that consist of large overlapping areas (more than 80 %) varying as well the camera angles with respect to the building (convergent images) in order to ensure that all areas of the object or scene are captured. Moreover, at the same time as the close-range acquisition, UAV images were also acquired. The data were acquired using a DJI (a) The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-M-2-2023 29th CIPA Symposium "Documenting, Understanding, Preserving Cultural Heritage: Humanities and Digital Technologies for Shaping the Future", 25-30 June 2023, Florence, Italy Matrice 300 with the Zenmuse P1 camera (Figure 2 -a), a fullframe, 45 MP, CMOS sensor with a 35 mm lens, for the case study of the Fantacasino (section 2.2.1). On the other hand, the Zenmuse L1 camera was employed in the case study of the Arab bath of Mezzagnone (section 2.2.2), the camera integrates a Livox Lidar module, a high-accuracy IMU, and a 20 MP camera with a 1-inch CMOS (Figure 2  As is well known, the DJI Matrice 300 could perform a direct georeferencing approach (Czyża et al., 2023;Kersten et al., 2022;Štroner et al., 2021); in the study areas, the N-RTK solution was selected as for the acquisition with the 3D ImageVector. The idea behind these two types of acquisitions was to evaluate the possibility of performing a complete photogrammetric survey (terrestrial and aerial) using a direct georeferencing approach and integrating both the acquisitions performed with the UAV and the traditional photographic camera. For the case of the Fantacasino, three different acquisition schemas were followed by the Matrice: a traditional double grid nadir acquisition, an oblique double grid acquisition, and a circular flight. For the case of the Mezzagnone two different acquisition schemas were followed: a traditional double grid nadir acquisition and an oblique double grid acquisition. In order to evaluate the accuracy of the close-range and UAV direct georeferencing approaches first and second-order topographic networks were established and measured for both the case studies. For the Fantacasino, the first-order topographic network was composed of 3 vertices that were measured using a traditional GNSS static approach; the second-order network was composed of 30 artificial markers that were measured using a TS. For the Arab bath of Mezzagnone, the first-order topographic network was composed of 5 vertices that were measured using a traditional GNSS static approach; the second-order network was composed of 25 artificial markers that were measured using a TS. The two networks were adjusted using some GNSS reference stations (three for both) part of the HxGN SmartNet Network by Hexagon (https://hxgnsmartnet.com/)

The Fantacasino case study
The first area on which the different tests have been performed is the Fantacasino building (Figure 3), the attraction of the Leisure Grove in the Garden of Reggia di Venaria Reale (Turin, Italy). It was designed for children, families, and visitors of all ages and is inspired in its form and size by the original temple of Diana and provides a modern-day interpretation of the gardens as a sort of playing ground. The building was built in 2010 during the restoration works of the area and now is under renovation due to a large degradation process of the wooden structure.  For the terrestrial acquisition, the dataset was obtained using the Sony α7R III (full-frame camera with a pixel size of 4,51 µm), equipped with a 24 mm lens. As acquisition strategy, according to the particular shape of the Fantacasino (circular building), the "center recording" strategy with significant overlapping (80%) at a distance of 10 m from the object was followed (Figure 4 -b). A total amount of 65 images with the Sony α7R III were acquired with a mean GSD of 1.2 mm/pixel.

Thermal bath of Mezzagnone case study
The Mezzagnone Bath (  The aerial acquisition were completed with the DJI Matrice 300 equipped with the Zenmuse L1. The two different flights were achieved at a flight altitude of 30 m (Figure 6 -a); 367 images were acquired and the final mean GSD was 7.3 mm/pixel. Despite being the Zenmuse L1 being a LiDAR (Light Detection And Ranging) instrument the RGB sensor has a sufficient resolution to use the acquired images also for a photogrammetric approach; in the standard processing pipeline images are used to transfer RGB information to the LiDAR point cloud.
The terrestrial acquisition was again performed with Sony α7R III (full-frame camera with a pixel size of 4,51 µm), equipped with a 24 mm lens. The acquisition strategy (Figure 6 -b) followed again consolidated approaches for single isolated buildings with a circular acquisition. Also in this case, images were acquired both with the main camera axis perpendicular to the building, both with an inclination of around 45° degrees of this axis. A total of 189 images were acquired with a mean GSD of 1.1 mm/pixel.

DATA PROCESSING AND ANALYSIS
All the photogrammetric processing has been carried out using the well-known commercial software Agisoft Metashape ver 2.0. Several analyses were conducted on the data during and after the processing and the measured topographic points were used as independent CPs to evaluate the accuracy of the direct georeferencing for all the processing strategies. Firstly, the level arm of the employed camera was evaluated to obtain correct results on the data acquired by the 3D ImageVector. This value defines the offset/distance from the antenna's phase center to the camera sensor's center. The company provides the measures of the different offsets between the GNSS phase center and the central pin of the hot shoe where the system is installed, respectively 8.3 cm for Y, 1.5 cm for X and 0 for Z, according to the reference system reported in Figure  7.
In addition to the sensor level arm, the offsets connected to the measures between the hot shoe pin of the employed camera and the camera's center must also be measured. These phases were carried out manually using a digital caliper. In the following Regarding the Matrice 300 data, for both the employed sensors, the N-RTK approach was followed as well, using the same GNSS network for the real-time corrections. In this case, all the offsets between the phase center of the onboard GNSS antenna and the camera center are automatically added to the images during the geotagging procedure thus no further intervention is requested from the operator's side. Different tests and analyses were performed and reported starting from the available data. Four main processing strategies were followed for the terrestrial data. First of all the terrestrial images with the geotag were used. The accuracy evaluation was carried out using the measured points as CPs (29 points  Moreover, the issues connected with the estimation of camera calibration approaches were also considered with the other three different processing approaches: performing a pre-calibration or using a self-calibration with a small number of GCPs (1 and 3).

Terrestrial processing only with geotag
In this approach, all the available control points were used as CPs to evaluate the accuracy of the terrestrial direct georeferencing process. A preliminary phase before the processing is connected with the geotagging of the acquired images. This step is completed with the software associated with the 3D ImageVector solution: REDtoolbox. The software is dedicated to postprocessing GNSS data and was used to process the satellite and IMU observations together with the acquired images. The trigger files saved in the data logger of the 3D ImageVector is used to connect and assign these two type of observations to the acquired images that are then saved with their respective geotagging.
Analyzing the processing is possible to underline that during the acquisition the system is not able to add the correct geotag immediately to each image, but a short post-processing is needed. The images are stored in the internal memory of the camera and at the same time (thanks to the pin-hot shoe) the position (N-RTK data) and attitude (IMU data) data are stored in the data-logger. The correct position and attitude are finally assigned to each image with the REDtoolbox software. According to this the approach of the 3DRedcatch could be better defined as quasi-N-RTK.Some screenshots of the workflow and the results of the processing are reported in Figure 8 and Figure 9. After this pre-processing, images were then processed inside Agisoft Metashape following a consolidated workflow and using the measured markers as Check Points. In this approach, the solution of the I.O. phase and the estimation of camera calibration parameters was achieved via a self-calibration. The Root Mean Square errors (RMSe) on CPs for both the Fantacasino and Mezzagnone case studies are reported in Table 2.  Table 2. RMSe on CPs for the only geotag strategy

Processing with geotag and camera pre-calibrated
The second processing strategy foresaw the use of an available set of I.O. parameters for the employed camera. For this strategy, the images geotagged after the pre-processing phase were used together with a pre-calibration solution. The calibration parameters were estimated using another dataset acquired with the same camera and with a set of control points that were employed for a self-calibration solution. The parameters estimated using this solution were then used in the projects of the two selected case studies as calibration certificates to fix the camera I.O. parameters. The results achieved by adopting this strategy are reported in Table 3.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-M-2-2023 29th CIPA Symposium "Documenting, Understanding, Preserving Cultural Heritage: Humanities and Digital Technologies for Shaping the Future", 25-30 June 2023, Florence, Italy

Processing with geotag, self-calibration and 1 or 3 GCPs
The last two strategies for data processing of the terrestrial images were completed using a self-calibration approach together with one or more GCPs. It has been proved in previous research (Teppati Losè et al., 2020) that using a few GCPs can enhance the self-calibration phase and lead to more accurate results in the estimation of the I.O. parameters of the used camera. Two different configurations were tested following this strategy: using only 1 GCPs (results in Table 4) and using 3 GCPs (results in Table 5); all the remaining ground control points were used as CPs.

UAV data processing
The processing of the two UAVs dataset was quite straightforward following consolidated workflow inside Agisoft Metashape. The images provided by the DJI Matrice 300 and acquired with the Zenmuse P1 and L1 sensors are already correctly geotagged if a N-RTK approach is used for data acquisition. All the available ground control points were thus used as CPs. The processing results for both case studies are reported in Table 6.

DISCUSSION AND CONCLUSIONS
To draw some conclusions, the results achieved on the CPs for the different processing strategies have been reported in Table 7 and Figure 10 for the terrestrial datasets. For both the case studies, the first approach using only the information derived from the geotagged images was able to deliver similar results with a mean RMSe below 9 centimeters. The second approach with the pre-calibration certificate of the camera has a more negligible impact in terms of RMSe for the Fantacasino case study, while no changes can be observed in the Mezzagnone case study. Similar results can be observed for the approach with 1 GCPs with slightly better results for the Mezzagnone case study. Finally,as is aspected, the last approach with 3 GCPs is the one with the best performances and with the lower RMSe of all the processing strategies. The results achieved with these strategies are probably related to two main factors: the geometry of the acquisition and, consequently, the estimation of the camera I.O.P. performed during the solution of the I.O. phase. For both case studies, it was guaranteed a large overlap between the acquired images and the use of different camera angles during the acquisition. Thanks to this, the estimation of the I.O.P. was able to deliver results similar to the ones achieved with the pre-calibration approach.

Figure 10.
Graphical representation of the mean RMSe on CPs for the terrestrial data processing strategies.
A single GCPs was not enough to enhance the estimation of the camera parameters, while using at least 3 GCPs lead to better results in this phase. However, more tests and experiments on this specific topic need to be accomplished to deepen this issue. Generally speaking, the results achieved with all the different strategies can be considered satisfying to cover the needs of the documentation process in the Cultural Heritage domain. Obviously, these results need to be related to the requested representational scale and the accuracies required from the different metric products to be delivered after the survey.
If we look at the conventional accuracy for the standard representational scale, it is possible to say that the data acquired and processed are not suitable for traditional architectural and archaeological representation for both e scenarios: only using the N-RTK geotag, or using a few GCPs since the objective in this cases is to reach accuracy under 2 cm in order to be able to extract traditional drawing at a representation scale between 1:50 to 1:100 (with precision between 1 and 2 cm). These results are confirmed at least by the two selected case studies, but further The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-M-2-2023 29th CIPA Symposium "Documenting, Understanding, Preserving Cultural Heritage: Humanities and Digital Technologies for Shaping the Future", 25-30 June 2023, Florence, Italy tests are needed to see if they can be generally considered true in different applicative scenarios. Finally, since today for achieving a complete survey of a Built Heritage, an integration of terrestrial and aerial data is needed the accuracy evaluation of the employed technique was carried out, adding to the close-range acquisition of the data acquired by the employed UAV. All the four different processing strategies were tested, and a summary of the achieved results is reported in the following Table 8 and Figure 11. A significant enhancement of the RMSe on CPs can be observed if terrestrial and aerial data are processed together (Table 8). In this case, the first approach using only the images geotagged via the N-RTK delivered an error of less than 5 centimetres for the Fantacasino and of 6 centimeters for the Mezzagnone.

Strategy
In contrast with the terrestrial dataset, using three GCPs in the integrated processing only impacts the Mezzagnone dataset with an RMSe on CPs of around 4 centimetres. The introduction of aerial images allowed to strengthen the geometry of the acquired photogrammetric blocks and to better refine the position and orientation of the terrestrial dataset allowing for better results of the overall processing. And allow to bring the results closer to the ones required for the architectural and archaeological representation scale.
Finally, another aspect that needs to be considered is connected to the best practices for data acquisition with the 3D ImageVector. Further research and efforts need to be also dedicated to this phase which requests a minimum of training for the operator in charge of data acquisition during the survey process. At the same time, the possibility offered by the system to use a PPK approach needs to be explored as well.
As preliminary results of this research, it is possible to state that direct georeferencing approaches for terrestrial photogrammetric datasets are rapidly developing and that they are reaching similar accuracies to the same solution already in use for aerial acquisition.
A step forward in this direction is connected to the use of this solution for smartphone-based photogrammetry. Mobile devices equipped with systems similar to the 3D ImageVector can lead to wider use of direct georeferenced terrestrial datasets with interesting results. There are other issues and research topics connected to the use of smartphone-acquired images in photogrammetric approaches aside from the solution of the E.O. phase. However, this is a topic that requires dedicated research and specific experiments.