MACHINE LEARNING FOR THE DOCUMENTATION, PREDICTION, AND AUGMENTATION OF HERITAGE STRUCTURE DATA

: The paper presents an effort to develop learning models based on the massive amounts of data that has been accumulated over the past decades during the process of digital documentation of heritage structures around the globe especially those in disaster zones. The development of an ontology is proposed that describes heritage buildings, their sites, and major hazard events that may cause damage to them. This ontology can serve as a repository for documenting heritage structures and provide highly structured data for developing machine learning systems that can identify patterns of damage from recorded image data. For heritage structures in seismic zones, the first step in ontology development is analyzing available earthquake information about the event and the damage information. The resulting model will create links between information items, for example relating the extent of the damage of an element to the earthquake magnitude and its distance from the epicenter. The ontology may also include collected images from previous earthquake events, with links to the objects in each image. Special tools will focus on selecting sub-models to be included in a machine learning model. For example, if the learning objective is to identify the damage and its extent from an image, then the rules will select the features in the model that relate to structural damage and identify each type of damage. It is hoped that this work will help develop learning systems that speed up processing of large volumes of image damage data collected from heritage sites.


INTRODUCTION
During the past thirty years the world has experienced unprecedented levels of devastation, loss of innocent lives and staggering losses caused by catastrophic earthquakes as presented in Table 1.  Table 1. Major earthquakes over the last 30 years The resulting aftermath of human toll of hundreds of thousands of lives lost, and economic losses amounting to billions of dollars seriously affected countries' economies in different With the recent advancements in digital technologies and tools available for the documentation of heritage structures, (e.g. satellites, UAV, TLS, drones, GPS, and GIS), massive volumes of data have been collected and accumulated over the recent decades during the survey and documentation process of field reconnaissance and observed damage following devastating earthquakes, (Hutchinson, 2017), (Hadick, 2022). The data covers a wide range of heritage structures in many regions around the world. The sheer volume of data makes it difficult to comprehend the extent of coverage and utilize it in a useful way. The earthquake hazard assessment and seismic retrofit of buildings is a complex problem on both the architectural and engineering levels. It involves many related issues and variables. These issues cannot be considered in isolation. Some of the attributes may be partially defined and some of the required information may be incomplete or missing. Decisions have to be made on the basis of such uncertainties. There is a need and an opportunity to develop learning models based on the massive volumes of data acquired through detailed digital documentation of heritage structures especially in disaster zones around the globe. The paper presents an ontology-based environment for critical assessment of the available digital data from the heritage documentation process. Handling such a large volume of data requires a common representational model, such as an ontology, to relate the data elements and identify overarching concepts that are present within the data. The common ontology facilitates the development of systems that provide analysis of the data and extraction of useful patterns, such as machine learning systems. The results will provide an intelligent framework for systematically deriving and updating lessons from observed earthquake damage data, and from databases of heritage documentation processes.
The intelligent framework will manage the complexity of analysis, in collaboration with the human user, by creating information models for buildings, over which deep learning systems can operate to analyze the building conditions and provide feedback to users. The internal model is based on a domain ontology to represent building elements and the rich and complex relationships that exist among them. By embedding knowledge into the building analysis environment, we provide context for the analysis tools to produce more meaningful results and recommendations to the user. The information model represents the building under analysis and the deep learning systems examine different aspects of the building and its affected elements, such as significant architectural features and elements (e.g. domes, bell towers, facades, ornamentations, etc.) and assess their cultural value and their need for restorations.

OBJECTIVES
This paper proposes to develop an ontology to represent all aspects of heritage structures and their values, and use the ontology to generate machine learning (ML) models for specific types of analyses as needed. The ontology allows for the definition of rules that specify required parts of the general model to be included in each machine learning model. There are three components to this proposed development: 1) the ontology, 2) mapping tools and rules, and 3) the learning system platform. Figure 4 shows the general architecture of the proposed system.

PREVIOUS WORK
Over the past ten years the authors have been actively pursuing the development of intelligent knowledge-based systems for extraction of lessons from the massive body of knowledge that has been accumulated from the observations of the performance of existing buildings in general, and cultural heritage structures in particular, during catastrophic earthquakes around the world over the past sixty years and beyond (Rihal & Assal, 2012).
In the early work an intelligent framework was developed to support the critical assessment of the available body of knowledge, and the development of intelligent design agents, that serve as analysis tools in different areas of the seismic design process (Rihal & Assal, 2016). This work was followed by the development of an ontologybased environment for earthquake analysis and design, involving identification and integration of data sources, developing a unified model of building information and risk information, intelligent analysis tools and a system for providing access to the unified model for data consumers (Rihal et al. 2020).
An effort to better understand the types of data needed for machine learning systems, lead to further research into the information needs for a deep learning system for the assessment and restoration of earthen heritage structures (Rihal & Assal, 2022).  Recent comprehensive work on automated processing of earthquake damage image data has been presented by (Yeum, et. al. 2016) (Yeum, et. al. 2017) and . It is based on machine learning and computer vision, to classify and organize large volumes of visual data, in support of postearthquake reconnaissance.
An annotated damage image database to support AI-assisted Hurricane Impact analysis has been presented by (Ou, et. Al. 2021).
A deep learning algorithm was implemented by (Patterson et. al. 2018). It supports automated image classification of seismic damage to built infrastructure, and identification of multiple damage types and associated structural members in a single image The recent works cited above deal with seismic damage data for structures in general and do not include damage data for heritage and historical buildings.

HERITAGE DATA
Heritage structures represent a great value for its society and serves as documentation for history and culture. The damage caused by earthquake events to heritage structures may lead to loss of this documentation and the cultural value associated with it. It's imperative that such structures be preserved, documented, and restored in the event of damage. In addition to structural details of building elements, heritage data include dates, historic significance, association with prevailing culture at the time of construction, and other important aspects. The diversity of data formats (e.g. pictures, videos, textual descriptions, etc.) and the organizations performing the documentation work require that there be a standard for describing elements of structures, so that collaborative work can take place and restorative work has a point of reference of how the structure looked before the damage.

Advanced technologies for digital documentation of heritage structures
Over the past fifty years great progress has taken place in the application of advanced technologies e.g. UAV, drones, terrestrial photogrammetry, aerial photogrammetry, terrestrial laser scanning (TLS), airborne laser scanning (ALS), LiDAR, cameras, GIS etc., as evidenced by (Santana 2022); (Jigyasu et. al 2022); (Kallas, 2021); (ADRC 2016); (Croce, et. al., 2021); (Servin, 2010); (ICCROM, 2022); (Bartlett et. al., 2020); (Altan, et. al., 2001); (Chatzistamatis, et. al, 2018); (Stepinac, et.al , 2020); (Rouhani, et. al., 2020); (Monical, 2020); (Costamagna, E., et. al., 2019); (Kaartinen, E. et. al, 2022); (Shrestha, et. al. 2017). Excellent contributions and leadership to the field of information technologies and 3D digital documentation of cultural heritage before and after disasters and pandemics around the world, have been presented by (Santana, 2021) The emergency 3D documentation and photogrammetry survey of the cultural heritage buildings damaged by the 2020 explosions in Beirut, identification of buildings at high risk of collapse, and development of conservation plans for the heritage buildings has been presented by (Kallas, 2021). A very exciting project on monitoring endangered archaeology in the Middle East and North Africa (EAMENA) has been presented by (Rouhani, 2020). This project involved creation of a digital record of archaeological sites into the EAMENA database to improve the inventory, management and preservation of archaeological sites. The sources of information for this project are remote sensing; published reports, existing databases and archives; and data collected from field surveys and field assessment.
In general, the digital workflow can be represented as shown in the Figure 5. A great deal of work has been carried out in recent years on the reconnaissance and 3D documentation of heritage structures damaged by earthquakes and other disasters as outlined in the references cited above.

Digital Documentation of Heritage Structures damaged during earthquakes
The most interesting type of documentation of heritage structures is the digital 3D laser scanning and photogrammetry.

Metropolitan Cathedral, Mexico City -3D Digital Documentation -CyArk:
One of the important sources of heritage 3D data is CyArk as presented by (Hadick, 2022 (Preciado et. al. 2022), (Preciado, et. al. 2020), (Weiser et. al., 2018). A total of 58 temples were visited: 22 in Oaxaca, around the Mixteca Alta and Tehuantepec regions; 11 inside the Mixteca region, Puebla; 15 in Morelos, and 10 in Mexico City. Some of these temples are included in the UNESCO World Heritage List of UNESCO. A preliminary seismic damage and vulnerability assessment of Mexican churches after the September 2017 earthquakes was also presented by (Diaz, et. al., 2019). A database was developed by the team at UNAM to manage the large amount of information collected during the damage survey and reconnaissance program. The database was developed by using the Microsoft Access software. The database was organized in such a way that it can be searched in three different ways: state or location, type of roof system and shape of the nave. Once the search is selected, the list of churches that match the query criteria is displayed. Clicking on the name of the selected church, the record of each church is accessed. This record is divided in five sections. In the first section, the name of the church and its location is displayed. The second section is about the photos of the building available in the database. For each record, there are photos of the state of the structure before the 2017 earthquakes, of the damage observed by these earthquakes and of the rehabilitation project (in case there is information about it). In the third section, complementary files, as documents, videos or draws, can be found. The fourth section give information about the characteristics of the church, as well as the damages reported for each microelement: façade, nave, transept, apse and dome. Finally, the record ends with a brief overview of the damage observed in the building, as well as the reinforcing actions implemented (in case there is information about it). Further details of the UNAM database are presented elsewhere .  Meo et. al., 2023). Details of the functioning of Da.D.O. section on churches, a detailed description of the databases and the survey forms used in post-earthquake inspections of churches, from which the databases were derived are presented therein, including some results about the most common typologies, structural characteristics and the damage mechanisms that were activated during the seismic events. An analysis of the observed damage to thirty-six masonry churches during the 2016 Central Italy earthquake has been presnted by (Ferracuti et. al. 2022). The analysis highlighted the most frequent damage mechanisms and the most vulnerable macroelements, including development of a damage index based on the observed damage and the macroelements present in the surveyed churches.

Information Sources
Many databases exist with organizations that track and record earthquake information in different ways, ranging from general earthquake event data, such as location, epicenter, magnitude, etc. (e.g. USGS) to damage data to structures and more specifically damage to cultural heritage buildings (e.g. World Monument Fund, ICOMOS, CyArk, UNAM, INAH). Figure 6 lists some of the more prominent data sources around the world, which provide access to researchers and earthquake experts.
One of the first disaster and failure studies data repository was the NIST database for the 2010 Chile earthquake as presented by (Catlin and Pujol 2015).
A general-purpose natural hazard image database of different hazards e.g. earthquake, tsunami, volcano, landslide, geology is hosted by (NOAA, 2023).

ONTOLOGY DEVELOPMENT
An ontology is a description of a field of study in terms of objects and their relationships with some rules of inference, which describe implicit relationships. It may be developed in a hierarchy or in a graph-like structure. A well-developed ontology is the basis of intelligent applications and tools in addition to supporting the creating of machine learning models for many purposes. We propose the development of an ontology that describes heritage buildings, their sites, and major hazard events that may cause damage to them. This ontology can serve as a repository for documenting heritage structures and provide highly structured data for developing machine learning systems that can identify patterns of damage from images, predict the type of damage that may occur in the case of a given event, such as an earthquake, and support efforts of renovation, restoration, or conservation. The ontology has four main areas: buildings, hazard events, sites and regulations. It represents all the available information about structures in earthquake zones, related to earthquake events and the damage that may have been caused by it. The regulations component of the ontology reflects the cultural value of structures through the requirements of preservation and repair efforts that must be performed on damaged structures. The cultural value can be separated into its own section of the ontology to reflect values that may be designated by nongovernmental organizations.
The four areas of the ontology will be connected to describe statements such as: "Structure 'A' in location 'B' sustained damage of elements [a, b, c]   Rules can be declared in the ontology to define conditions under which class objects can have a relationship to other objects in the same class or in other classes. These rules can be used to select a subset of the ontology to be included in a machine learning model. For example, a ML system that aims to learn patterns of structural damage in buildings may include attributes of the buildings class which describe structural elements and elements from the hazard events class that describe the forces generated by each event and the stresses that impact the structural elements.

METHODOLOGY
The ontology development starts with an analysis of available earthquake information, both event information and damage information. This analysis should lead to the design of a model that encompasses aspects of that are represented in the information. The model will create links between information items, such as relating the extent of damage of an element to the earthquake magnitude and distance from the epicenter.
Tools will be developed to work with the model to allow ontology navigation. Some tools will allow exploring the objects and their relationships in the model. Other tools will allow the definition of rules or constraints to select specific objects of interest to examine. The ontology may also include collected images from previous events, with links to the objects in each image. This part may facilitate the creation of machine learning models later.
Special tools will focus on selecting sub-models to be included in a machine learning model. For example, if the learning objective is to identify the damage and its extent from a picture, then the rules will select the features in the model that relate to structural damage and identify each type of damage. Figure 5 shows the general steps for creating a machine learning system. The proposed ontology and its tools will assist in the steps of 'feature extraction' and 'model development

CREATION OF MODELS FOR ML SYSTEMS
Machine learning systems require large volumes of data prepared and organized into models that describe the system's interest (i.e., what the system is trying to learn) and the significant attributes of the objects involved. Given the variety of learning objectives and the number of machine learning systems that can be built with the collected data, it would be hard to develop each model by hand from scratch. The highly structured nature of the ontology allows us to build tools that would extract a desired model out of the relevant portion of the ontology. We propose building tools that would take in model requirements, generate an appropriate model from the ontology, and populate it with data as appropriate.

Figure 8.
Steps of creating a machine learning system

OUTCOMES AND BENEFITS
This work will provide an overall ontology for the earthquake (and general hazard) events, which will relate the events to the damage they cause to structures. It will also aid in developing learning systems that speed up the processing of large volumes of images collected from damaged sites to make more effective use of the massive amounts of recorded damage data for decision making for conservation of cultural heritage buildings in hazard zones around the globe. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-M-2-2023 29th CIPA Symposium "Documenting, Understanding, Preserving Cultural Heritage: Humanities and Digital Technologies for Shaping the Future", 25-30 June 2023, Florence, Italy