CULTURAL HERITAGE DIGITAL PRESERVATION THROUGH AI-DRIVEN ROBOTICS

: This paper introduces a novel methodology developed for creating 3D models of archaeological artifacts that reduces the time and effort required by operators. The approach uses a simple vision system mounted on a robotic arm that follows a predetermined path around the object to be reconstructed. The robotic system captures different viewing angles of the object and assigns 3D coordinates corresponding to the robot's pose, allowing it to adjust the trajectory to accommodate objects of various shapes and sizes. The angular displacement between consecutive acquisitions can also be fine-tuned based on the desired final resolution. This flexible approach is suitable for different object sizes, textures, and levels of detail, making it ideal for both large volumes with low detail and small volumes with high detail. The recorded images and assigned coordinates are fed into a constrained implementation of the structure-from-motion (SfM) algorithm, which uses the scale-invariant features transform (SIFT) method to detect key points in each image. By utilising a priori knowledge of the coordinates and SIFT algorithm, low processing time can be ensured while maintaining high accuracy in the final reconstruction. The use of a robotic system to acquire images at a pre-defined pace ensures high repeatability and consistency across different 3D reconstructions, eliminating operator errors in the workflow. This approach not only allows for comparisons between similar objects but also provides the ability to track structural changes of the same object over time. Overall, the proposed methodology provides a significant improvement over current photogrammetry techniques by reducing the time and effort required to create 3D models while maintaining a high level of accuracy and repeatability.


INTRODUCTION
The use of 3D measurements and digital reconstruction in the domain of cultural heritage has become increasingly important in recent years due to technological advancements and greater accessibility to technologies that produce satisfactory results (Ćosović and Maksimović, 2022). Digitally rendered artifacts play a significant role in safeguarding material culture, including small objects, grand architecture, and entire cultural heritage sites. This is evident in the increasing efforts to systematically digitise global cultural heritage. For instance, the European Commission has recently introduced the common European data space for cultural heritage, a "new flagship initiative" funded under the Digital Europe Programme (DIGITAL) aimed at accelerating the digital transformation of Europe's cultural sector and promoting the creation and reuse of content in the cultural and creative industries (Europeana Foundation, 2023). This initiative highlights the significance of generating and sharing high-quality digital data from cultural heritage entities in a collaborative and accessible way. It is funded by the Digital Europe Programme (DIGITAL) of the European Union.
Producing digital twins of cultural heritage entities, which are virtual counterparts of physical products, assets, or systems that reflect the elements and dynamics of the way the complex systems run and evolve over time (Ćosović and Maksimović, 2022), is regarded as pressing and inevitable for purposes such as preservation, documentation, research, and public engagement.
With a vast amount of cultural heritage around the world, there is an urgent need to thoroughly digitise these entities, which necessitates developing methods to reduce the time required to record, process, reconstruct, and deliver accurate 3D reproductions. It, in turn, demands automation in every phase of the 3D modeling pipeline (Remondino, 2011). The ultimate goal of 3D digitisation in cultural heritage is to create accurate, detailed, and accessible digital twins of physical objects, assets, and systems. These digital twins can be used for a variety of purposes, including preservation, documentation, research, and public engagement.
One of the significant benefits of 3D digitisation in cultural heritage is the ability to preserve and protect physical objects and sites. Digital twins can function as a backup in case of damage or destruction of the original object, and they can also be used to study and understand the object without risking damage. Additionally, 3D digitisation can provide valuable information for conservation and restoration efforts, enabling experts to analyse the structure and condition of the object and identify potential issues that may arise in the future.
Another essential benefit of 3D digitisation in cultural heritage is the ability to make these objects and sites accessible to a broader audience (Tsvetaeva, 2022). Digital twins can be shared online or through virtual reality experiences, enabling people from all over the world to experience and learn about these objects and sites. This can be particularly valuable for objects and sites that are difficult to visit or are located in remote areas.
Furthermore, digital twins can play a crucial role in predictive maintenance of physical assets. By continuously monitoring the data from sensors installed on the asset, the digital twin can detect any anomalies and alert maintenance personnel before significant damage occurs (Luther et al., 2023). This can help prevent costly downtime and repairs.
Overall, digital twins have the potential to revolutionise cultural heritage management and become real "knowledge models" (Gabellone, 2022). As technology continues to evolve and become more sophisticated, we can expect to see even more innovative applications in the future.
In this article, after providing a comprehensive summary of current developments in both the acquisition and resolution of 3D models, we present a novel approach to produce precise and highfidelity 3D models of archaeological artifacts by utilising 3D data acquisition techniques in combination with a computer vision system mounted on a robotic arm that follows a predetermined path. This ground-breaking technique not only minimises the need for manual labour but also ensures exceptional levels of accuracy and precision in the resulting models.

Acquisition methods
Various methods are used in the field of Cultural Heritage to acquire, elaborate, and store 3D models, including Structurefrom-Motion (SfM), Structured Light Scanning, Laser Scanning, LiDAR, and others.
Structure-from-Motion (SfM) is a popular 3D reconstruction technique that recovers the 3D volume of an object from a series of images showing different views and recorded by one camera (Tomasi and Kanade, 1992). This methodology involves several steps, such as camera movement around the object, acquisition of multiple images, identification of features, matching, and assignment to a position in three-dimensional space. The processing time and the resolution of the reconstructed volume are proportional to the number of different views captured. The higher the number of captured profiles, the higher the level of detail and the processing time required to reconstruct the object in 3D. However, capturing a higher number of profiles also leads to longer digitisation time, which can be a constraint in reconstructing a large number of entities with the proper quality.
Structured Light Scanning is another 3D reconstruction technique that uses a projector and a camera to capture a series of patterns projected onto an object from different angles. The patterns create shadows on the object's surface, which are captured by the camera and used to reconstruct a 3D model. Structured Light Scanning has several advantages, including high accuracy, the ability to capture color information, and fast scanning speed. However, this method also has some drawbacks, including sensitivity to ambient light, limited range, and difficulty in capturing fine details (Rachakonda et al., 2019). Additionally, the equipment required for this technique can be expensive, and the setup process can be time-consuming.
Laser Scanning is a popular technique for 3D digital documentation and preservation of cultural heritage assets. The method involves a laser beam that is directed onto the object, which records the geometry and texture of the surface by measuring the time it takes for the laser to reflect back to the scanner. The resulting point cloud data can be used to create highresolution 3D models of objects, buildings, and entire heritage sites, providing valuable information for research, conservation, and public engagement. Laser scanning can capture complex shapes and details that may be difficult to obtain with other techniques. However, laser scanning can be expensive, requires technical expertise to operate, and may not be suitable for objects that are sensitive to light or heat (Koch et al., 2017).
LiDAR (Light Detection and Ranging) is another remote sensing technology widely used in cultural heritage applications. LiDAR sends pulses of light to the surface of an object and measures the time taken for the reflected signal to return, allowing the creation of high-resolution 3D point clouds of the object. LiDAR can quickly capture large areas and is particularly useful for outdoor archaeological sites and large structures. It is also capable of capturing details of inaccessible areas, such as the interior of caves or the tops of buildings. However, LiDAR can be expensive and requires specialised equipment, making it less accessible than other 3D scanning methods. Additionally, its accuracy can be affected by atmospheric conditions and vegetation, which can result in incomplete or distorted data.
Apart from the aforementioned methods, other techniques used for 3D digitisation in cultural heritage include photogrammetry, multi-view stereo (MVS), and time-of-flight (ToF) cameras.
Photogrammetry involves capturing multiple photographs of an object from different angles and using software to reconstruct a 3D model. MVS is a technique similar to SfM that involves capturing images of an object from multiple viewpoints and using computer algorithms to reconstruct a 3D model.
ToF cameras are sensors that emit infrared light and measure the time it takes for the light to reflect back, providing depth information that can be used to create 3D models.
Each of these techniques has its advantages and disadvantages, and the choice of technique will depend on factors such as the size and complexity of the object, the desired level of detail, and the available resources.
In heritage studies and archaeological practice, SfM is still considered the standard due to the increasing quality levels reached by photo cameras and the lower costs (Chandler and Buckley, 2016). Additionally, the superior quality of textures obtained through SfM technique, which is of utmost importance to archaeologists and heritage experts, make SfM an attractive option (Kaneda et al., 2022). However, SfM has its limitations, such as the slow processing when compared to other methodologies, and the potential for operator inaccuracy in the process of capturing the needed pictures.
The use of advanced technologies like AI-powered robotics can automate the implementation process of some of these 3D digitisation techniques. Robotics has become increasingly necessary in various industries due to the demand for efficiency, accuracy, and cost-effectiveness. The field of Cultural Heritage has similar needs. Automation can significantly enhance the implementation process of the data acquisition technique by reducing the manual labor involved in capturing and processing data, which can be time-consuming and prone to human error. Moreover, automation can enable the processing of large amounts of data in a fraction of the time it would take manually, allowing for a quicker and more comprehensive analysis of cultural heritage assets.
Therefore, given the current state of the art in 3D modelling techniques and the need for faster and more accurate methods, the use of automation is an attractive option. By automating the acquisition techniques, we can minimise the limitations of manual effort and improve the quality and efficiency of 3D modelling processes.
To achieve this objective, we propose a workflow that integrates AI-powered robotics into the SfM technique. This workflow aims to automate the entire process of capturing images, identifying features, matching them, and assigning them to a position in three-dimensional space. By automating this process, we can achieve faster and more accurate results with minimal human intervention, thus increasing the productivity and efficiency of the 3D modelling process.

Resolution
Resolution is a crucial factor for 3D reconstruction techniques. The quality of the 3D model improves as the resolution decreases, allowing for more precise and accurate reconstructions. The resolution is primarily determined by the algorithm used for the reconstruction and the input images. Previously, the resolution of a 3D model was determined by the object's diameter and the camera's spatial acquisition rate. However, advancements in computer vision and technology have made this relationship unreliable, producing only qualitative results. Despite this, the notion that a more refined observation angle leads to higher-quality images and lower final resolution persists. Therefore, an accurate measurement of a reconstructed 3D model's resolution may indicate the quality of the reconstruction itself (van Heel et al., 2020).
The algorithms to measure the resolution of 3D models can be grouped in two major categories (Penczek 2002): • techniques based on the comparison of averaged subsets of the data, such as the Fourier Shell Correlation (FSC) (van Heel et al. 2005) or the Differential Phase Residual (DPR) (Frank et al. 1981).
• algorithms based on the Fourier transform of individual images, such as the Q-factor (Kessel et al. 1985) and the spectral signal-to-noise ratio (SSNR) (Unser et al. 1987).
The first group of algorithms has a significant advantage over the second, as it can measure the resolution in both 2D and 3D (Penczek, 2002). The FSC is the dominant method used to measure resolution and has become the standard in recent years. FSC was proposed in 1987 by Harauz and Van Heel and is the 3D extension of the Fourier Ring Correlation (FRC). FSC measures the cross-correlation between two 3D models in the Fourier space created from two subsets of the same dataset. This method compares equivalent regions of the two models based on frequency and determines the resolution as the frequency at which the FSC drops below a specific threshold. The threshold is conventionally kept at 0.143, derived from the correlation between a reconstructed density map and a perfect reference map.
However, the efficacy of FSC has been widely debated due to the structural limitations introduced by the FSC itself. The ratio behind splitting the dataset to create two different models inevitably biases the final resolution, and FSC produces only a global value that does not consider all the peculiarities of the reconstructed model. To overcome these problems, the ResMap algorithm was proposed (Kucukelbir et al., 2013). ResMap detects the features of a model by fitting a 3D sinusoidal function in different points of the volume and saves the wavelength of the smallest sinusoid detectable above noise.

A novel semi-automatised methodology
The 3D reconstruction methodology proposed in this paper is based on the Structure-from-Motion (SfM) technique, which reconstructs the 3D volume of an object from a series of 2D images captured from different views by a camera. Typically, SfM-based reconstruction methodologies identify feature points (also called keypoints) in the acquired images (Wu et al., 2010) and match them across the entire series (Iglhaut et al., 2019). The matched keypoints are further refined to remove any outliers, a process commonly known as bundle adjustment. Thus, the position of the keypoints in different images is utilised to simultaneously compute the camera's pose and assign a set of 3D coordinates to each keypoint, resulting in a 3D sparse point cloud of the scene. Figure 1. Schematic representation of the 2D images acquisition system. A robotic arm UR3, equipped with a stereo camera, moves along circular trajectories of variable radius, drawing a hemisphere around the object to be reconstructed. The arm stops at a pre-defined pace, enabling the camera to acquire images. The acquisition points are labeled with black dots, corresponding to intersections between the circular trajectories (in violet), representing the Z resolution, and the acquisition rate (in cornflower blue), which sets the angular resolution.
However, the quality of the reconstruction and the resolution of the reconstructed scene in SfM-based techniques are significantly affected by the number of views and the accuracy of the pose estimation. To overcome these limitations, an automated routine was designed to acquire a high number of images at a constant pace. A robotic arm UR3 equipped with an RGB camera mounted on its wrist was programmed to perform circular trajectories centred around the scene to reconstruct in 3D at different height values. The radius of these circumferences decreases with the height, creating a hemisphere around the object to be reconstructed. The number of circular trajectories and the acquisition rate define the Z resolution and the angular resolution, respectively. The robotic arm travels along circular trajectories, stopping to acquire images of the scene at a pace determined by the acquisition rate (Figure 1). The angular and Z resolution have a significant impact on the quality of the reconstructed 3D model. A higher acquisition rate results in a more resolved reconstruction. However, processing a large number of images can be computationally demanding, resulting in longer processing times and requiring more powerful workstations. Therefore, both resolution values need to be carefully selected, considering the details of the object to reconstruct, the desired final resolution, and the processing time.
To determine the best values, an AI-based algorithm has been developed. The robot rotates around the center of the scene at a fixed distance, acquiring four images 90 degrees apart at a 45degree angle from the horizontal plane. These images are combined to obtain a coarse 3D model to estimate the object's dimensions and center of mass, which are then used as input for the AI-based technique. This technique outputs the radius of the circumferences, and the angular and Z resolutions.

Figure 2.
Schematic representation of the AI-based technique defining the camera acquisition rates. The system generates an initial coarse 3D model to estimate its dimensions and center of mass of the object, and then processes these values to determine the radius of the circumferences, as well as the angular and Z resolutions. The technique selects the best solutions by ensuring that the object does not overlap by more than 90% in consecutive images (highlighted in green).
The system identifies the best solutions by ensuring that there is no more than 90% overlapping of the object in consecutive images ( Figure 2). The ability to finely adjust both the Z and angular resolution values allows for a precise and dense acquisition of views, resulting in high-accuracy reconstruction of the scene. Additionally, these values can be easily modified to optimise the reconstruction for objects of different sizes. The pose of each view is determined by the pose of the tool center point, which is expressed as two sets of 3 coordinates to model its position and orientation, respectively (refer to Figure 3).
Another advantage provided by the robotic arm is its ability to move with millimetre precision, thus overcoming one of the main limitations of the SfM methodology. By providing highly accurate pose values and eliminating operator errors in the workflow, the poses of all views are well constrained, simplifying and improving the performance of the reconstruction algorithm. The poses obtained are first used to correct any camera distortion, and then to assign a set of 3D coordinates to the different images, generating a dense and refined point cloud output.

Figure 4.
Schematic representation of the comparison between the conventional 3D reconstruction technique (a) and the one proposed in this paper (b). The proposed method uses a robot to rotate around the scene, which constrains and simplifies the methodology.
To evaluate the quality of the reconstruction, it is essential to measure the resolution of the 3D model. Therefore, the resulting 3D model obtained by the proposed methodology is processed using ResMap algorithm. This algorithm produces a local resolution map and associates a distribution of values to the resolution of the density map. A precise local estimation of the resolution enables structural analysis, setting a limit to the significant elements in the 3D reconstruction. Furthermore, the robustness of this methodology to noise helps avoid confusing high-quality results with high-frequency noise, as the latter may be visually appealing but can deteriorate the information (Röhrbein et al., 2015). Figure 4 illustrates a comparison between the traditional methodology(a) and the one proposed in this paper (b), which eliminates the need for bundle adjustment and generates directly a dense point cloud. This approach not only improves efficiency but also reduces computational overhead, resulting in faster reconstruction times during the computation stage.

CONCLUSIONS
Acquiring images and transforming them into 3D models is a complex process that requires careful consideration of various factors. One of the most critical factors is the acquisition rate of the images, which can significantly impact the accuracy and consistency of the final reconstruction. Conventional methodologies often involve capturing images in an unordered sequence, which can lead to variations in the reconstructed 3D models, even when using the same image sequence.
To address this issue, our proposed method utilises a robotic arm to standardise the image acquisition process. The motion of the robotic arm is programmed to move in an optimised manner, capturing images at specific intervals by avoiding redundant information, high computational times, and ensuring high-quality results. This approach offers several advantages over traditional methodologies. For one, it increases the repeatability and consistency of the reconstructions, as well as improving their accuracy and reliability.
A key advantage of using a robotic arm for image acquisition is that it enables us to constrain the acquisition rate. By capturing images at a consistent pace, we can obtain a more accurate representation of the scene, even when there are small variations due to degradation or damage. Because the images are captured at a consistent interval, the impact of any changes in the scene is minimised, resulting in a more accurate representation of the scene.
Another advantage of our proposed method is that it provides a more reliable basis for comparing different 3D models. With highly consistent acquisition rates, we can be confident that any differences between the models are due to actual changes in the scene, rather than variations in the image sequence or reconstruction process. This increases the accuracy of the final model and makes it easier to identify any changes or differences between the models.
Our proposed method offers promising opportunities for the future development of 4D applications in cultural heritage preservation. By adding time as the fourth dimension, we can track changes over time, which is crucial for the conservation and preservation of material culture, and enables the creation of more advanced Digital Twins beyond simple 3D scans. Monitoring structural variations affecting the reconstructed volumes of the scene through time enhances the precision of monitoring these transformations. This has become an essential component of conservational maintenance for archaeological artifacts, enabling better-informed decisions about how to protect and preserve these valuable pieces of our collective cultural heritage.
Using our method, mistakes made during the evaluation of modifications are minimised, providing more reliable distinctions in conditions of damage or aging. This is particularly important in the field of cultural heritage preservation, where accurate and reliable data is crucial for making informed decisions about how to conserve and protect these valuable artifacts.