A UAV PHOTOGRAPHIC PATH PLANNING METHOD FOR HIGH-QUALITY RECONSTRUCTION OF CULTURAL HERITAGE

: The image-based reconstruction method can preserve geometric and textural information with relatively high accuracy, making it a suitable method for digitally documenting cultural heritage. However, the quality of the reconstructed model largely depends on the quality of the captured images. Unmanned aerial vehicles (UAVs) equipped with a camera and gimbal offer great convenience for image acquisition in 3D reconstruction. However, ensuring safety, high efficiency, and full coverage is a challenge. To address this, we propose a UAV photographic path planning method for efficient and automatic image acquisition of heritage scenes, based on which high-quality reconstruction is realized. A priori proxy of the scene is obtained in advance and utilized to (1) generate initial viewpoints for subsequent optimization; (2) generate the SDSM for obstacle avoidance, signal analysis, and sight occlusion judgment; and (3) segment to obtain planar regions to sample representative points for measuring the reconstructability of heritage scene and optimizing the viewpoints. Our method enables the planning of regular and safe final paths for the high-quality reconstruction of cultural heritage, outperforming both commercial software and state-of-the-art methods in both real and virtual scenes.


INTRODUCTION
In recent years, the demand for documentation of cultural heritage is constantly increasing (Aicardi et al., 2018;Gomes et al., 2014;Murtiyoso and Grussenmeyer, 2017).The image-based reconstruction method can better preserve the geometric and textural information at a low cost and has become an important tool for heritage documentation (El-Hakim et al., 2004;Liu et al., 2022).Now that the image 3D reconstruction algorithms are more mature, the image quality becomes the main factor affecting the reconstruction quality.Inadequate and insufficient coverage can result in mismatches between images and holes in reconstructed models (Furukawa and Hernández, 2015;Schönberger et al., 2016), while excessively redundant images would increase the time and calculation cost during image acquisition and reconstruction processes, and even lead to poor reconstruction quality (Zhang et al., 2021).Considering the complexity of the cultural heritage scene, how to capture images is an essential issue in its reconstruction.
Currently, commonly used image acquisition methods for large scenes include conventional vertical aerial photography and oblique photography (Dahlke et al., 2015;Nesbit and Hugenholtz, 2019).For flight safety, these two methods usually carry out a regular path in a grid, zigzag or circle path beyond a certain distance above the scene.They are mostly carried out with a 2D horizontal flight or 2.5D terrain-following flight (Liu et al., 2022;Pepe et al., 2018).Although oblique photography is performed through a tilt camera module with a multi-directional lens, the closer it gets to the bottom, the more texture is missing and distorted in the tilted image.In addition, the occlusion and loss of details caused by remote photography are not addressed (Toschi et al., 2017).High-quality 3D reconstruction requires complete coverage of all details of the heritage scene.3D closeup photography methods that can completely obtain information from multiple angles are undoubtedly superior (Schönberger et al., 2016;Seitz et al., 2006).Currently, the multi-rotor UAV equipped with RTK (Real Time Kinematic) module and gimbal provides the hardware implementation basis for close-up photography and automatic image collection (Koch et al., 2019;Li et al., 2017;Nex and Remondino, 2014).
However, most of the 3D flight paths of UAVs are performed under manual control or some predefined flight modes in practical operations.Manual control requires UAV operators to capture images with full coverage and safety guaranteed, making it a complex and challenging task (Zhang et al., 2021).In this situation, a greater number of images is prone to be captured and so cause redundancy and long-time consumption.Thus, the path planning of UAV automatic photography in 3D space is vital for realizing high-quality 3D reconstruction (Hepp et al., 2019;Roberts et al., 2017).It is necessary to plan the photographic position and orientation according to the requirements of 3D reconstruction (Smith et al., 2018;Zhou et al., 2020).
In this paper, we propose a UAV photographic path planning method that can efficiently and automatically complete the image acquisition with safety guaranteed, based on which high-quality and efficient reconstruction of cultural heritage is realized.To verify the effectiveness of our method, we compared it with the state-of-the-art in virtual scenes and commercial software in real scenes.Experiments demonstrate the remarkable performance of our method, which provides a reliable and efficient solution for the high-quality reconstruction of cultural heritage from the aspect of data acquisition.

METHODS
In contrast to real-time and online planning methods that do not require a proxy, the photographic planning method discussed in this paper requires a priori proxy of the heritage scene.The overall process from photographic planning, image acquisition to the final 3D reconstruction is complex.We summarize the processes as shown in Eq. (1).Firstly, according to the requirement of model resolution, the appropriate image acquisition parameters  need to be set, including camera parameters and the ground sample distance (GSD), which corresponds to photographic distance.Then, based on the proxy  of the heritage scene  , the viewpoint set is planned and connected into a collision-free path for image acquisition, from which the model  is reconstructed.
We focus on path planning to improve image acquisition efficiency, ensuring flight safety and high-quality reconstruction.The workflow of our method is shown in Fig. 1.

𝑀 𝑅𝑒𝑐𝑜𝑛 𝐶𝑎𝑝𝑡𝑢𝑟𝑒 𝑷𝒍𝒂𝒏 𝑃 𝑆 , 𝑝𝑎𝑟𝑎𝑠
(1) The priori proxy, that is the coarse mesh of the scene, is utilized to (1) generate initial viewpoints which are regularly arranged; (2) generate the SDSM for obstacle avoidance and signal analysis of the initial viewpoint set to delete unsafe viewpoints; and (3) segment planar regions to sample representative points for measuring the reconstructability of heritage scene.Based on the reconstructability and its distribution of sample points, the viewpoints with high contribution to the reconstruction are selected to the optimized set during iterations.
Figure 1.The workflow of UAV path planning method for high-quality reconstruction.

Viewpoint Generation for Flight Safety and Continuity
Priori Geometric Proxy Similar to the method in (Smith et al., 2018;Zhang et al., 2021;Zhou et al., 2020), our method requires an initial proxy that would serve as the data basis for the entire path planning process, including sample point and viewpoint generation, obstacle avoidance, and occlusion judgment.A relatively accurate absolute position is required for the initial proxy to ensure flight safety and one-to-one correspondence between the planned viewpoint and the actual photographic position.Common proxies include point cloud (Yan et al., 2021), DEM (Smith et al., 2018), 2.5D coarse model (Zhang et al., 2021;Zhou et al., 2020), BIM, 3D mesh obtained by oblique photography (Li et al., 2023), etc.The 3D coarse mesh is usually adopted as the initial proxy, as shown in Fig. 1, which can be reconstructed by capturing images over the scene following a regular path.Shaped DSM Both the generated viewpoints and the final paths connecting the viewpoints need to take into account obstacle avoidance and occlusion.In addition, when judging the visibility of a viewpoint to a target point in the scene, the occlusion of the sight also needs to be considered.Based on the priori proxy of the scene, obstacle avoidance and occlusion can be determined with its geometric structure.The proxy represents the surface to be reconstructed, which is also considered the obstacle.Theoretically, to judge whether the sight and path are occluded or not, it is only necessary to determine if they intersect with the triangular faces of the proxy.However, although the priori model is relatively coarse, the number of faces is still large, especially in large scenes, and the repeated calculation of ray intersection with triangular faces can be computationally intensive.Subsequent signal analysis is also difficult to achieve through intersection judgments.Thus, we convert the proxy to SDSM (Shaped Digital Surface Model) by ray intersection with the proxy for quick judgment, as shown in Fig. 3.The maximum elevation of every grid of 1 * 1 on SDSM is retained.In addition, we filter the elevation of SDSM by the maximum value according to the ideal signal mechanism with the empirical value 3:1.That is when the elevation of a certain grid of the SDSM is 3 m, viewpoints within its  m range should be higher than 3 3 *  m.The elevation of SDSM represents the minimum flight height of the corresponding range.Based on the SDSM, obstacle avoidance and RTK signal analysis are considered to ensure flight safety and continuity.
A viewpoint is outside the obstacles and in areas with good RTK signal if its elevation is higher than the elevation of SDSM with the same coordinates on the XOY plane, as shown in Fig. 4. For sight and path occlusion judgment, sampling is performed on them to generate points.If the elevation of every point is higher than SDSM, it's believed there is no occlusion, as shown in Fig. 4.

Viewpoint Optimization for Efficiency Image Acquisition and Reconstruction
Reconstruction Heuristics For viewpoint optimization, the information gained from each viewpoint is required to be measured, as well as the reconstructability of the scene.The common solution is to evaluate the reconstructable value of dense sample points sampling on the proxy based on reconstruction heuristics to represent the reconstructability of the whole scene.
The heuristics brought by (Smith et al., 2018) is widely used, in which shooting distance, observation angle, parallax angle and multi-view observation are all taken into account. ,  , which is referred to as  in after, represents the reconstructable value of sample point  from the viewpoint set .According to (Liu et al., 2021;Smith et al., 2018;Zhou et al., 2020), when  of the sample point exceeds the minimum threshold  , which is set to 1.3, the sample point is considered to meet the reconstructability.And if  exceeds the maximum threshold  , which is set to 5.0, redundancy exists in the viewpoints visible to it.
Representative Sampling Many methods such as (Hepp et al., 2019;Roberts et al., 2017;Smith et al., 2018) have attempted to propose more reliable reconstruction heuristics, establishing a strong correlation between the reconstructable value and reconstruction accuracy of sample points.However, we focus on generating typical sample points that are more representative of the scene reconstructability.
Planar segmentation enables the clustering of triangular faces on the priori proxy that are adjacent and approximately oriented in the same direction, and the edges of triangular faces with large variations in normal vectors can also be extracted.Therefore, the proxy is first partitioned into planar regions by the segmentation method of (Bouzas et al., 2020), and geometric primitives including polygons (representing the planar regions) and lines (representing the edges between planar regions) are then extracted.Every triangular face on the proxy belongs to a particular primitive.Even when the heritage scene is complex, the extracted primitives can still reflect the geometry structure of the scene well.We then uniformly sample points on the geometric primitives.The normal vector of the sample point in polygon primitives is consistent with segmented planar regions, while the normal vector of the sample point in line primitives is the average of its two adjacent regions.
Since the normal vectors within the primitives are consistent, the sample points are more representative of the reconstructability of a certain range of their surrounding neighborhood.Thus, a larger interval can be set to generate the sample points, achieving a smaller number of typical sample points to measure the reconstructability of the whole scene.Note that every triangular face on the proxy is clustered into specific planar regions, i.e., each triangular face has a corresponding sample point to measure its reconstructability, and edges with large variations in normal vectors are also sampled.The resulting sample points can cover all the details in the scene to be reconstructed, including edges, corners, fragmentary details, etc., without being particularly dense.Fig. 5 shows the distribution of sample points generated by our representative sampling method and the other two methods.Our sample points are evenly distributed in the planar region, with points sampling on discontinuities, which is presented especially with a 2.5D coarse proxy.Compared with the other two sampling methods, a more representative evaluation which includes the reconstructability measurement of every fragment detail is realized with fewer sample points.The validity of our sampling method is verified experimentally subsequently.
Visibility Judgement For fast optimization, we calculate and save the visibility between viewpoints and sample points in advance.First, it is considered that the sample point is not visible from the viewpoint if the distance between them exceeds two times the photographic distances corresponding to the preset GSD.Second, it is not visible if the shooting angle between the shooting sight and the normal vector of the sample point exceeds 60° which leads to a narrow view.Third, a colinear function from photogrammetry is carried out to determine whether the sample point is photographed by the viewpoint.And the last is the sight occlusion judgment in Fig. 4. The visibility between viewpoints and sample points is determined after the above four conditions are calculated.

Greedy Optimization
To quickly optimize the viewpoints, we then construct an objective optimization function as shown in Eq. ( The practical implication of Eq. ( 2) is to find the best viewpoint  arg max   | , whose sum reconstructable value of the unreconstructed sampling point set   is the lowest, i.e. ∑   ,  ∈ is the largest.However, the lowest sum reconstructable value form does not take into account the effects of the photographic distance of the viewpoint, the observation angle and the intersection angle with the existing optimized viewpoints.Therefore, it is guaranteed to have more intersections with the existing optimized viewpoint set by  ∩  ; to have the maximum average distance from the existing set of optimized viewpoints by   ,  , i.e. to have a larger intersection angle with the existing viewpoints; to have the viewpoint pose facing fronto with the sample point by   .In addition, the average distance between the chosen viewpoint and its visible sample point set is as close as possible to the preset photographic distance by   ,   .The viewpoint is brought closer to the visible sample point by   ,  .
The optimal viewpoints are selected by iteration according to Eq.
(2), and the final set of optimized viewpoints is obtained when the number of sample points satisfying reconstructability exceeds a certain proportion.The proportion of iteration termination is usually set at 85%-95%, mainly considering that there may be some obvious occluded areas in the scenes and corresponding proxies such as slits that are always unable to satisfy the reconstructability.Although Eq. ( 2) is formally complex, many of its terms are fixed values.And the time complexity of one iteration can be converted from   to    with a simple algorithmic process, making iteration extremely fast. is the average number of viewpoints visible to a sample point,  is the opposite, and  is the number of safe viewpoint points, that is, the cardinality of  .In this way, the viewpoint optimization process is very fast and the number of viewpoints can be determined as required, ensuring coverage of detail without adding too many viewpoints.
Through greedy optimization, the final viewpoint set can be obtained quickly.Then we connect these viewpoints to a flight path based on ACO (Ant Colony Optimization) algorithm with path occlusion considered.Note that we set the cost between two viewpoints based on ideal energy consumption calculated with the constructed mathematical model from UAV flight dynamics.

Evaluation Metrics
We evaluate our photographic planning method from three aspects, including the scene reconstructability based on reconstruction heuristics, the path quality, and the reconstruction quality of the model.The scene reconstructability represents the reconstructability and corresponding distribution of the sample points.However, it is not the actual quality of the reconstructed model.The path is evaluated with path length, ideal flight time and energy.For model quality, the accuracy and completeness are measured based on ground truth with sampling point clouds on reconstructed models and GT models (Smith et al., 2018;Zhou et al., 2020).
To further verify the effectiveness and advancement of our method proposed, self-evaluation experiments are conducted firstly to verify the effectiveness of our representative sampling and greedy optimization method based on scene reconstructability.Then, in both virtual and real scenes, our method is compared with SOTAs and the method from commercial software.

Self Evaluation
Effectiveness of Representative Sampling To validate that the sample points generated by our method are more representative to measure the scene reconstructability, multiple sets of controlled experiments are set up.For two types of proxies, sample points are generated by random, Poisson and our sampling method.The number of sample points of different methods is close to each other.The three optimized viewpoint sets corresponding to three types of sample points are generated using the greedy optimization method described in section 2.2.For the viewpoint set optimized from the sample points of our method, the proportion of sample points of random and Poisson that meet the reconstructability is calculated.Similarly, for the set of optimized viewpoints generated from the other two types of sample points, the proportion of our sample points that meet the reconstructability is calculated.The results are shown in Table 1.The viewpoints (Poisson) represent the viewpoints obtained by optimizing the sample points generated from Poisson sampling, and the same with the other two sampling methods.
The optimization iteration terminates when 95% or 90% of the sample points satisfy the reconstructability.
From Table 1, for the viewpoint sets generated from our sample points, the proportion of the other two types of sample points satisfying the reconstructability exceeds the proportion of our sample points, while the proportion of viewpoint sets generated from other types satisfying the reconstructability of our sampling points are all lower than their own proportion.Fig. 6 shows the reconstructability distribution of the sample points obtained by different optimized viewpoints.From Fig. 6 (c) and (f), the reconstructability of the planar regions is similar for different viewpoint sets optimized from different sets of sample points, while it varies more in the edge region.As can be seen from Table 1 and Fig. 6, our sampling method is more representative to measure the reconstructability of the scene, especially in areas such as the edges of the scene.

Effectiveness of Greedy Optimization
To demonstrate the effectiveness of greedy optimization, we generate optimized viewpoint sets that satisfy the reconstructability with 95% and 98% of the sample points on the 2.5D and 3D proxy of the School scene.Then we import the viewpoint sets generated by (Zhou et al., 2020) for comparison, and calculate the proportion of the reconstructed sample points and their distribution, as shown in Fig. 7 and Fig. 8.When using a 2.5D proxy, the sample points and the scene can be basically reconstructed with 270 images based on our method, and the distribution is also uniform.And when 320 images are obtained, the difference with the (Zhou et al., 2020) is small.Similar results are obtained with a 3D proxy.Note that the 330 viewpoints of (Zhou et al., 2020) only satisfy the reconstructability of 96.7% and 96.3% of the sample points, which does not reach the 98% set by our method, proving the effectiveness and efficiency of our greedy method.

Comparison with SOTAs in Virtual Scenes
The UrbanScene3D (Lin et al., 2022) virtual dataset provides proxies and generated photographic paths for methods of (Zhang et al., 2021;Zhou et al., 2020), so we compare our method with them.Two virtual scenes are adopted.The school scene is a sophisticated modern building, while the town scene leans towards a classical style which is considered to be close to cultural heritage.We import the proxies and generate corresponding paths.For image acquisition, we develop a script based on Unreal Engine 4, which can realize the set of camera parameters, and read the path file to capture high-resolution images automatically.For the model reconstruction, we use Context Capture software to carry out aerial triangulation and dense reconstruction, where the local coordinates of image collection position are all imported to participate aerial triangulation process.
In the processes of photographic planning and image acquisition, the camera parameters consistent with those of (Zhang et al., 2021;Zhou et al., 2020) are set.Table 2 shows various comparisons between the two methods and ours on the School and Town scenes, where the path quality of (Zhou et al., 2020) is not compared due to the unusually longer lengths and times calculated.From Table 2, our method has the best or near-best reconstruction quality on both types of proxies for two scenes.Note that since our path captures multi-angle images in the same position, we get a much shorter length.However, the UAV must hover to change pose, so the energy is not much less but still relatively low since (Zhang et al., 2021) optimize the viewpoint and path simultaneously to get a path with very low energy consumption.Fig. 9 shows the visual fidelity of the reconstructed model on the Town scene compared with (Zhou et al., 2020).In all, our method realizes higher quality reconstruction with fewer images and high efficiency of image acquisition.

Comparison with MetaShape in Real Scene
MetaShape is a reconstruction software that combines 3D reconstruction, path planning and some other functions.It can automatically plan a photographic path based on a delineated 2D plane range and can customize obstacles such as power lines to avoid collisions.And the path generated can be directly exported to DJI P4RTK for flight.
In the real scene, we choose a historic building at Wuhan University to evaluate our method.The building is rather complex, with many intricate details such as ornate carvings on its surface.We compare our method with MetaShape on this scene.The priori proxy was reconstructed from images captured with the straight-down pose at a fixed height.Both methods are set to a ground resolution of 0.6cm, which corresponds to a photography distance of 20 with DJI P4RTK.Fig. 10 shows the comparison of the planned path and the reconstructed model, with the number of viewpoints generated by both methods being close to the same.It can be found that each viewpoint in the path generated by both methods is oriented towards the heritage scene to be reconstructed as far as possible.And both methods obtain a better reconstruction quality of the top of the scene.However, our method gets better visual fidelity of the model in the local detail area.

Discussion
Our method exploits the geometry structure of the scene and its proxy, which is its most important feature compared to SOATs and MetaShape.First, we generate a set of regularly arranged viewpoints for subsequent optimization.After optimization, these viewpoints remain relatively regularly arranged.Second, we generate SDSM for efficient obstacle avoidance and convenient RTK signal analysis.Third, we realize a more representative measurement of scene reconstructability with sample points generated after geometric planar segmentation.In addition, we adopt a greedy optimization strategy to obtain final optimized viewpoints.However, some improvements can be made to enhance our method, such as using more representative reconstruction heuristics from a learning-based method.

CONCLUSIONS
In this paper, we propose an efficient path planning method for digital documentation of cultural heritages, based on which flight safety and continuity are guaranteed as well as the reconstructability of the whole complex scene.Compared with existing methods, (1) the flight safety and continuity are analyzed and guaranteed based on SDSM; (2) a more representative evaluation of the reconstructability of heritage scene is realized with sampling points on extracted planar regions; (3) the viewpoints are optimized with a fast selection from safe viewpoint set; (4) a regular path is connected to capture images with a shorter length.Compared with the commercial method and the state-of-the-art in both real and virtual scenes, we realize high-quality reconstruction of cultural heritage with higher efficiency in image acquisition, providing a complete and efficient UAV photographic path planning solution for highquality reconstruction of cultural heritage.

Figure 2 .
Figure 2. The different orientations generated in every voxel.

Figure 3 .
Figure 3.The generation process of SDSM from the proxy.

Figure 5 .
Figure 5. Sample points generated by different sampling methods in 3D proxy (top row) and 2.5D proxy (bottom row) (a) Random to Random (b) Random to Ours (Local) (c) Random to Ours (Global) (d) Ours to Random (e) Our to Ours (Local) (f) Our to Ours (Global) Figure 6.The reconstructability distribution of the sample points obtained by different optimized viewpoints.A to B corresponds to the reconstructability distribution of sample points of B from viewpoints generated by A. The bluer the color of the sample point, the more redundancy exists; the redder it is, the less the reconstructability is likely to be satisfied.When the reconstructable value of a sample point is close to  =5.0, it shows a white color.

Figure 7 .Figure 8 .
Comparison of different optimized viewpoint sets and the corresponding distribution of sample points (2.5D proxy) The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-1/W1-2023 12th International Symposium on Mobile Mapping Technology (MMT 2023), 24-26 May 2023, Comparison of different optimized viewpoint sets and the corresponding distribution of sample points (3D proxy)

Figure 9 .Figure 10 .
Figure 10.Comparison of the path and reconstructed model between our method (left) and MetaShape software (right) 2), and continuously iterate to select the viewpoint that obtains the maximum expected gain of reconstructability to join the final optimized set. is the sample point set which is visible from the viewpoint  . is the existing optimized viewpoint set during the current iteration, and || equals the iteration times.  ,  is the average distance from  to the optimized viewpoint set.When  ∅ , we set   , ∅ max  ,  ,  ,  ∈ , that is, the maximum distance between two safe viewpoints. is the safe viewpoint set.  ,  is the average distance from  to its visible sample point set  .| ∩  | is the cardinality of the intersection set of the set  and  . is the sample point set visible from the optimized viewpoint set.A large cardinality value means more intersections with . is the average shooting angle of the viewpoint  to  . is the shooting distance corresponding to the preset GSD.

Table 1 .
The proportion of sample points that satisfy the reconstructability with different sets of optimized viewpoints.