The Legacy of Sycamore Gap: The Potential of Photogrammetric AI for Reverse Engineering Lost Heritage with Crowdsourced Data

: T he orientation of crowdsourced and multi-temporal image datasets presents a challenging task for traditional photogrammetry. Indeed, traditional image matching approaches often struggle to find accurate and reliable tie points in images that appear significantly different from one another. In this paper, in order to preserve the memory of the Sycamore Gap tree, a symbol of Hadrian's Wall that was felled in an act of vandalism in September 2023, deep-learning-based features trained specifically on challenging image datasets were employed to overcome limitations of traditional matching approaches. We demonstrate how unordered crowdsourced images and UAV videos can be oriented and used for 3D reconstruction purposes, together with a recently acquired terrestrial laser scanner point cloud for scaling and referencing. This allows the memory of the Sycamore Gap tree to live on and exhibits the potential of photogrammetric AI (Artificial Intelligence) for reverse engineering lost heritage.


INTRODUCTION
The documentation and digital preservation of cultural heritage are crucial tasks for tracing and inheriting our past, as well as raising awareness (Stylianidis and Remondino, 2016) and achieving United Nations Sustainable Development Goals (Xiao et al., 2018).Heritage may include man-made objects, intangible aspects of culture but also important landmarks that retain significance for local communities and visitors.The risk of destruction and obliteration, due to natural or anthropogenic causes, makes the documentation of heritage sites a crucial task to fulfil.The ability to reconstruct lost heritage through digital documentation, when preservation is no longer possible, is of paramount importance to preserve the memory and knowledge of our past.If reality-based 3D surveying methods are not possible (Guidi and Russo, 2011;Remondino, 2011;Remondino et al., 2018), archival or crowdsourced data can support 3D digitization tasks.3D reconstruction via touristic crowdsourced imagery has been a subject of studies for many years for lost or at risk heritage (e.g.Grün et al., 2004;Stathopoulou et al., 2015;Vincent et al., 2015;Fangi, 2015;Grussenmeyer and Al Khalil, 2017;Dhonju et al., 2018;Maiwald et al., 2021;Alsadik, 2022;Mazzacca et al., 2023).The main concept behind it is to leverage the popularity of heritage objects to garner sufficient images enabling photogrammetric methods (Bonacchi et al., 2014;Doulamis et al., 2020;Fangi et al., 2022;Jaud et al. 2022;Shivottam et al., 2023).3D reconstruction can support physical replicas on a small scale for valorization purposes and can also be used to bring lost objects back to life with augmented reality (AR) and virtual reality (VR) applications (Alkhatib et al., 2023).Although the value of crowdsourced datasets is undeniable, their inhomogeneous and unorganized nature presents several challenges that must be appropriately addressed.This work tackles the 3D reconstruction of a natural and lost natural heritage: the iconic Sycamore Gap tree (Figure 1), situated at the UNESCO World Heritage Site Hadrian's Wall and Housesteads Fort, close to Once Brewed in Northumberland National Park, England (Northumberland National Park, 2024).Sadly, in an act of apparent vandalism, the ca 150 year-old tree was felled on the evening of 27 th September 2023 (Figure 2), leaving the UNESCO World Heritage Site devoid of one of its most recognizable landmarks.Various Roman heritage sites in the vicinity of Sycamore Gap have benefited from prior geospatial research to integrate heterogeneous information to help inform understanding of temporal aspects of landscape change along Hadrian's Wall (e.g.Fieber et al., 2017).Photogrammetric reconstruction from archival or crowdsourced images presents specific challenges and may not yield sufficiently accurate and comprehensive results, potentially caused by a poor camera network and low image quality.Another challenge is the multi-temporal nature of the dataset and critical viewing angles, which hamper automatic tie point extraction.In terms of accuracy, there is also the risk of unreliable modelling of the camera's internal parameters during self-calibration, as each image has been taken with a different sensor.Since 2016, new techniques for automatic image matching based on deep learning (DL) methods have been proposed (Jin et al., 2021).New methods trained specifically to extend the limits of traditional matching based on local features such as SIFT (Lowe, 2004) open up the possibility of new applications in the field of automatic matching of complex datasets (Farella et al., 2022;Morelli et al., 2022;Remondino et al., 2022;Elias et al., 2023;Maiwald et al., 2023;Markiewicz et al., 2023;Morelli et al., 2023;Ioli et al., 2024).Boosted by these new photogrammetric AI approaches, this paper aims to valorize the legacy of the Sycamore Gap tree through its geometric and colorimetric 3D reconstruction for documentation and restitution to the community.The work demonstrates that deep learning-based local features are a useful tool for addressing the various challenges posed by multitemporal crowdsourced images and multi-modal datasets.
Figure 2: The felled sycamore tree after the deliberate act of vandalism.

DATA
Data consists of a series of terrestrial images and UAV videos acquired with different sensors and in different seasons.The terrestrial dataset comprises approximately 180 photos, mostly provided by the National Trust1 or downloaded from various internet sources (Wikimedia Commons, Flickr, Unsplash, etc.).The resolution of these images varies greatly (min.480 × 359 px, max.9504 x 6336 px).Although some of them are high-resolution, an important factor to reconstruct the finer details of the lost tree, many images are low-resolution or capture the tree from a large camera-object distance.Because flying with drones in National Trust land without permission is prohibited, the disposal of publicly available drone footages is limited.However, the National Trust provided four UAV video sequences, mostly high-resolution but affected by significant video compression.The videos typically capture the tree with a good variability of viewing angles, but most of the videos focus on the most spectacular and identifiable view between the two adjacent hills (Figure 1).Acquisition perspectives of the multi-temporal datasets are very different, notwithstanding the fact that the scene features an area covered by grass and there are many illumination changes.In addition, the nature of the Sycamore tree adds a layer of complexity to the reconstruction task: throughout its story, the tree has undergone a wide range of long-and short-term changes, that are reflected in the collected datasets (leaf growth and fall, wind-driven movements, etc.).All these issues complicate the possibility of using traditional automatic image-matching approaches and suggest the use of learning-based methods that have been proven to be consistent and reliable in various situations (e.g.Remondino et al., 2021;Peppa et al., 2022;Morelli et al., 2022;Morelli et al., 2024).A Terrestrial Laser Scan (TLS) survey of the Sycamore Gap tree and its immediate surrounds was performed by Newcastle University on 11 th October 2023, 14 days after the tree was felled and immediately prior to its removal from site (Figure 3).The time window available before removal of the tree limited scan time to 90 minutes.The TLS point cloud, captured using a Leica RTC360 scanner, was used to scale the photogrammetric models and as a reference to verify the quality of the image-based 3D reconstruction.

METHODOLOGY
The paper's objectives are twofold: • to orient all available datasets using photogrammetric AI methods; • to derive a 3D reconstruction the lost tree, emphasizing geometric and colorimetric accuracy.Results are anticipated to be useful for historical-documentary purposes, web-based visualization and AR/VR applications.

Orientation of crowdsourced images and videos
3D reconstruction from unregistered Internet-scale photo collections has received significant attention in recent years to advance image matching issues and Structure from Motion (SfM) performance (Frahm et al., 2010;Agarwal et al., 2011;Schonberger and Frahm, 2016).Crowdsourced images can pose significant challenges for matching processes, primarily due to the multi-resolution and multi-temporal nature of the data, as well as the extreme variations in baseline and viewing angles (including camera rotations).Conventional hand-crafted local features typically struggle with significant viewing angles changes and have limited invariance to illumination.To address these challenges, we employed SuperPoint+Lightglue (DeTone 2018; Lindenberger et al., 2023), a deep learning-based local feature and matcher specifically trained to deal with extreme scenarios.However, like other deep learning-based local features, they were not designed nor trained to operate on highresolution images due to computational constraints.To mitigate this limitation, we utilized deep-image-matching Python library (DIM) 2 (Morelli et al., 2024), which allows the use of learningbased matching methods with high-resolution images through a tiling approach.Once correspondences were found (for both terrestrial images and video frames), camera poses were retrieved using COLMAP (Schonberger and Frahm, 2016).As detailed in Section 3.2, some rendered views from the available TLS point cloud were used to scale and reference the photogrammetric results.In this context, the use of deep learningbased local features allows the matching of rendered images with real images, being able to identify tie points despite the large radiometric differences.The accuracy of image localization was assessed using natural check points identified on the tree.

Rendered views from TLS data
Traditionally, the alignment of a photogrammetric block and a TLS dataset involves an initial approximate orientation of the two point clouds, followed by fine co-registration using iterativeclosest-point (ICP) techniques.However, this approach has several limitations.Firstly, the initial rough alignment can be time-consuming, especially when repeated for multiple photogrammetric models, as in this case study, one model for each video, and when dense photogrammetric point clouds need to be generated for a substantial portion of the scene.Additionally, this method lacks deep integration between photogrammetric and TLS data, as it employs a rigid transformation.Moreover, this co-registration method focuses exclusively on geometric aspects and neglects colorimetric information, potentially leading to co-registration failures in scenarios with limited geometry.Following Elias et al. (2023), we propose a more integrated approach to combine TLS and photogrammetric data.This involves rendering the TLS point cloud from multiple viewpoints with known positions, orientations and camera parameters, which are then directly integrated into the photogrammetric pipeline alongside the original images.Since the rendered camera positions (in the TLS reference frame) are known with high precision, they can serve as constraints in the bundle block adjustment process, enhancing the accuracy of camera poses and 3D tie point estimation, whilst also providing a scale.This 2 https://github.com/3DOM-FBK/deep-image-matchingapproach is thought to improve the co-registration of photogrammetry-TLS data, as well as TLS-TLS in environments with limited geometric features.The rendering of the synthetic images is performed employing the Python library PlotOptix (Sulej, 2019), based on the Nvidia Optix ray tracing engine (Parker et al., 2010).We defined a virtual pinhole camera with known parameters and rendered multiple perspective views of the textured and georeferenced TLS point cloud.In our case, the section of the point cloud containing the felled tree was intentionally excluded from the data to prevent incorrect feature matching with images of the healthy tree (Figure 4-left).Selected poses comprising 1 nadir and 20 oblique images (Figure 4-right) were chosen to provide a robust and universal basis for feature matching with the available crowd-sourced images from both terrestrial and UAV sources.The rendering generated images with a resolution of 1920x1080 px and an approximate Ground Sampling Distance (GSD) of 2 cm.

Orientation of the terrestrial images
Given the crowdsourced nature of the touristic images (weak camera network, large baselines, etc.), some video frames were added to ensure a good orientation of the terrestrial dataset.Initial attempts to orient the images were made using the traditional SfM approach implemented in COLMAP and Agisoft Metashape3 (Figure 5a and 5b): the bundle adjustment closed with a successful orientation of 164 and 215 out of 337 images, respectively.Then image correspondences were extracted using the SuperPoint+LightGlue implementation available in the DIM tool.To identify image pair candidates, DIM initially performs a brute-force matching at low resolution (1000 px).Subsequently, keypoints are extracted from the full-resolution images through a tiling procedure, and corresponding tiles are identified using an initial low-resolution matching.DIM writes the matches to a database compatible with COLMAP and the matches are imported to execute the bundle adjustment.Using this approach, we were able to orient 299 out of 337 images (Figure 5c and Figure 6).As the simple convergence of the bundle adjustment process does not ensure accurate orientation of the images, a natural point (Figure 7) was selected as a check point to assess the accuracy of the adjustment.The check point is only visible in a subset of the images due to the tree's full leaf development and viewing angles occluding the check point.The natural point is visible in 107 images with an average reprojection error of 2.04 pixels for SuperPoint+LightGlue. On the other hand, the average reprojection error in Metashape is 2.66 pixels, with 62 projections.Projections with a reprojection error exceeding 5 pixels were excluded from the calculation.

Orientation of the video frames
Each video was processed with the DIM tool (SuperPoint+LightGlue) and incorporated the rendered images from the TLS point cloud.As illustrated in Figure 8, the orientation with DIM facilitated the co-registration of rendered images to video frames in all four blocks, whereas Metashape successfully co-registered only two out of four datasets.Differently to COLMAP, Agisoft Metashape supports the possibility of introducing a-priori camera positions as additional observations within the bundle adjustment (BA).Therefore, the image block oriented with DIM was imported into Metashape using the Bundler format.After a BA constraining the positions of the synthetic rendered images to their known values, dense point clouds from the frames of each video could be generated.

Dense point cloud generation
From the oriented, scaled and georeferenced video frames, dense point clouds were generated in Metashape for each video.To verify scale and referencing accuracy based on the integration of the rendered images, a section of dry-stone wall from the TLS data was used to evaluate the quality of the photogrammetric point clouds.To mitigate the presence of small missing data portions, a mesh was generated from the subsampled TLS point cloud at a resolution of 5 mm.A point-to-mesh unsigned distances were then computed using CloudCompare4 (Figure 9).1: Mean and standard deviation of the observed differences between photogrammetric point clouds from video frames and the TLS mesh over an unchanged section of dry-stone wall.
Given the low quality of the available videos, with an estimated GSD ranging between 2-5 cm, an average error in the range [2.4,4.0] cm and a standard deviation [1.8, 3.0] cm represent satisfactory results.Additionally, Figure 9 illustrates that these distance errors are significantly lower when considering the vertical surface of the wall.A substantial portion of the errors can be attributed to occlusions present at the top of the wall, which is irregular and challenging for reconstruction from video frames.Figure 10 shows the dense point clouds of the entire tree from the available videos.Results show a multi-temporal collection of canopy point clouds during the spring/summer seasons and trunk and branches during the fall/winter seasons.Specifically, videos 1, 2 and 3 enable the reconstruction of the canopy almost entirely, providing a comprehensive representation of this part of the tree.On the other hand, the (winter) video 4 captures the trunk and branches and the dense cloud appears quite noisy due to the limited resolution of the images, high compression levels, and the fact that tree branches are thin elements requiring a fine image GSD.Nonetheless, the trunk in the major sections appears to be adequately reconstructed given the available data.

Co-registration of trunk and branches
Given the poor resolution and high compression of video 4 and the noisy point cloud derived from it, the TLS survey of the felled tree was used to create a complete virtual Sycamore Gap tree.Firstly, using LeWoS (Wang et al., 2020), an automatic tool to separate the leaf and wood components based only on geometric information, leaves and branches/trunk in the TLS cloud were segmented (Figure 11a-b).Secondly, the TLS point cloud of the felled tree was oriented to the noisy point cloud created from video 4 where the tree is standing.This operation is necessary to correctly align the cut (lying down) portion of the tree with the remaining stump in the ground.Since we need to orient the TLS data to a noisy and somewhat sparse photogrammetric point cloud, a subsampled portion of the TLS point cloud (blue portion visible in Figure 11c) was used.An Iterative Closest Point (ICP) algorithm without scale estimation (since everything was previously scaled) resulted in a RMSE of 5.3 cm.This value is considered acceptable given the quality of the photogrammetric data used to generate the point cloud.Figures 11b-c show the entire tree from TLS oriented relative to the point cloud from video 4. The co-registration allowed the generation of a complete point cloud of the virtual Sycamore Gap tree which combines canopy information from videos and the form of the trunk and branches from TLS (Figure 12a).Notably there is a gap of ca 50 cm between the remnants of the stump and the re-erected felled tree: this is due to clearance and sampling operations perfomed on site by the National Trust between the tree being felled on 27 th September and the TLS acquisition on 11 th October 2023.The gap could be filled with the 3D results from the UAV video sequences, but their poor resolution and coverage did not allow a full reconstruction and integration (Figure 12b).The use of further terrestrial images will be investigated for this purpose in due course.

Web-based platform
To enhance the public accessibility of the achieved 3D results of Sycamore Gap, a web-based platform powered by Potree (Schütz, 2016) was developed.Potree is an open-source JavaScript library based on WebGL that enables effective rendering of large 3D point clouds within a web browser.The platform allows users to dynamically interact with the 3D data, perform measurements, add annotations and extract crosssections.Meshes, georeferenced images and other geospatial data are also supported for a seamless integration and the creation of rich 3D environments for comprehensive analysis and/or storytelling (Gaspari et al., 2024;Fascia et al., 2024).The realized web platform 5 (Figure 13) allows the visualization of Sycamore Gap at three distinct epochs: (i) its original prevandalised state (reconstructed from crowdsourced material), (ii) its condition immediately post-vandalism with the tree lying on the ground (from TLS data acquired on 11 th October 2023), and (iii) its present condition with only the stump remaining (from UAV data subsequently flown in a survey on 16 th April 2024).Within the 3D scene, the employed archival images are shown as hotspots, positioned according to their estimated locations and orientations derived from the photogrammetric processing.
The viewer provides a comprehensive and contextualized visual timeline of the tree's transformation together with a multiperspective experience of the tree throughout different seasons and lighting conditions.The integration of the WebVR library in Potree could enable users to interact with the scene through immersive, first-person VR experiences.

CONCLUSIONS
The aim of this work was to preserve the memory of a lost cultural heritage asset using crowdsourced images and videos, not only to digitally reconstruct the lost object for documentary purposes but also to share its memory through a web-based visualization tool.The available multi-temporal images, with varying viewing angles and illuminations, were challenging to be correctly matched using traditional algorithms (e.g.SIFT).Therefore, SuperPoint+LightGlue, a deep learning-based local feature and matcher trained on complex datasets, were employed.This led to the orientation of 40% more images compared to traditional techniques and achieved this with improved accuracy.To advance the processing, terrestrial images and video frames were matched with a few rendered images from the TLS data: this directly scaled and referenced the photogrammetric results with a significant reduction in matching time.The work has once again demonstrated the potential of photogrammetry, now supported by AI-based methods, for the virtual reconstruction of lost heritage.The legacy of the Sycamore Gap tree is now made available to the community using a Potree-based project where point clouds and images are displayed together in unison.As future work, we plan to test other AI-based methods, such as NeRF or Gaussian Splatting, to evaluate performances and bottlenecks in digitally preserving the memory of lost heritage by means of available crowdsourced data.

Figure 3 :
Figure 3: View of the TLS point cloud of the felled tree and its stump captured on 11 th October 2023.

Figure 4 :
Figure 4: Examples of rendered images from the TLS point cloud (left) and camera network of the rendered views (right).

Figure 5 :
Figure 5: Planimetric views of the oriented terrestrial datasets.

Figure 6 :
Figure 6: View of the internet image block oriented (299 / 337) with using SuperPoint+LightGlue and COLMAP bundle adjustment.The different heights of the pyramids representing the camera poses denote the different focal lengths of the crowdsourced images.

Figure 7 :
Figure 7: Manual localization of the check point on the Sycamore Tree in some of the available images.

Figure 8 :
Figure 8: Orientation of some of the UAV datasets with the rendered TLS images (central circular sequence around the stump).

Figure 9 :
Figure 9: Observed differences between video-based photogrammetric point clouds and the TLS mesh over an unchanged section of the wall.

Figure 10 :
Figure 10: Dense point clouds generated from each video sequence, previously referenced and scaled using the TLS rendered images.Videos cover different years and seasons showing the canopy in spring/summer and trunk in fall/winter.
Overall point cloud of the Sycamore Gap tree reconstructed merging the results from videos and the edited TLS data (a).The gap in the (TLS) trunk filled by 3D data coming from videos (b).