TECHNICAL FRAMEWORK AND PRELIMINARY PRACTICES OF GLOBAL GEOGRAPHIC INFORMATION RESOURCE CONSTRUCTION

: High-precision and real-time global geographic information data are fundamental and strategic resources in various fields such as safeguarding global strategic interests, studying global environmental changes, and planning for sustainable development. However, due to challenges related to ground control and obtaining reference information, the development of global geographic information resources faces significant hurdles in terms of geometric positioning, information extraction, and data mining. This paper starts with the characteristics of domestically produced remote sensing images and proposes a comprehensive technical framework centered around "uncontrolled geometric positioning, intelligent interpretation of typical elements, mining of multi-source data from abroad, and intelligent hybrid collection and compilation of Digital Elevation Models (DEMs)." The paper elaborates on the key technical challenges that need to be overcome and their corresponding solutions. It also outlines the development of relevant data products and production technical specifications. Multiple production-oriented software tools were developed, leading to the creation of a variety of data products in multiple types and scales, including global 30-meter land cover data, DEM data, core vector data, and more.


INTRODUCTION
Global geographic information resource data is an indispensable basic information and knowledge resource for engineering planning and construction, emergency response and rescue, ecological environment assessment and monitoring, and global change research and sustainable development planning.With the rapid development of economic globalization and global informatization, as well as the urgent need to solve global sustainable development problems, the construction of geographic information resources with a global focus has become an important part of the global strategy of developed countries in the world.Currently, the United States, Europe, Japan and other developed countries in the world have long used various means to obtain global geographic information as a reserve strategic resource, and have successively launched highresolution remote sensing satellites, and processed these satellite data into geographic information products.They now have high-precision data covering the global land area, and have formed a monopoly on the high-precision data.For example, the National Geospatial Intelligence System (NSG), the Global Land Cover Data Product (ESRI_Landcover_2020 in the United States, ESA_WorldCover2020 in the European Space Agency), the Global Digital Elevation Model (SRTM in the United States, ASTER in Japan, and the European WordDEM and terrain data products and services, such as OSM data, Google Map, etc. [OkolieC, Zhang Y J].The coverage of high-precision geospatial data in China is mainly limited to the territory of China, and the construction of overseas geospatial data resources needs to be strengthened.China is also developing global geospatial information resources at different scales to achieve the goal of domestic and independent geospatial information data products.However, due to overseas control and difficulties in obtaining reference information, a single image Source can't effective coverage, the construction of global geographic information resources is difficult to ensure the accuracy and reliability of geometric positioning, information extraction, and attribute richness, and is also faced with technological challenges brought by the global scale.Specifically, it mainly includes: 1) In terms of geometric positioning, due to the lack of calibration field and the lack of time-varying characteristics and calibration methods for the systematic errors of domestically satellites image.The position accuracy of images cannot meet the needs of mapping; 2) in terms of feature extraction, it still mainly relies on visual interpretation methods of human-computer interaction, which has low production efficiency and many human factors, and cannot meet the requirements of global geographic information resource construction; 3) in terms of attribute information acquisition, the feature data downloaded from multiple sources have different forms and chaotic attribute structures, making it difficult to fully explore and utilize [SHAN Jie].In addition, there are significant regional differences around the world, and single image and other data sources cannot achieve effective coverage, requiring the development of multiple source data research on production technology of hybrid mapping [ZHOU Xiaoguang].In response to the above challenges, we have made technological innovations to break through the key technologies for the development of global geographic information data products.With the main themes of "domestic satellite uncontrolled geometric positioning, intelligent interpretation of typical elements, multi-source geographic data mining, and intelligent hybrid compilation of DEMs", we have conducted and completed the overall technical research on the construction of global geographic information resources, formed an autonomous construction capability, and achieved the production of high-quality geographic information data worldwide.

MAIN TECHNICAL METHODS
The development of global geographic information resource products involves massive information processing of PB-level multimodal spatiotemporal data, and faces technical challenges brought by the global scale.Based on the demand for multisource, multimodal spatiotemporal data in global geographic information data products, we have developed typical element extraction for multisource image geometric collaborative processing, multimodal data fusion, and multisource land cover classification data management A set of processing technologies and methods such as mining and intelligent hybrid editing (Figure 1).Firstly, based on multisource image geometric collaborative processing technology, block adjustment and geometric uncontrolled positioning are completed.At the same time, through crowd-sourced data mining and fusion technology to achieve content information extraction and knowledge fusion; secondly, using multiple source data features, Realize fast automatic extraction and integration of elements based on deep learning models, and produce DEM data based on intelligent hybrid compilation.Based on the above methods, we have developed a set of multisource data intelligent processing algorithm models and production process flow support the production of global geographic information data products.
Figure 1 Overall Technical Framework

Satellite images collaborative geometric processing without GCPs
The extraction of geographic information is mainly based on satellite images (such as ZY-3).Satellite remote sensing platforms are inevitably affected by internal and external factors, resulting in a series of errors that affect the uncontrolled geometric positioning accuracy of satellite images.Currently, the operational production accuracy of 2.5-meter resolution satellite stereo images cannot directly meet the requirements for mapping accuracy at a scale of 1:50,000 without ground control points [GONG Jianya, WANG Mi].The main error sources affecting the geometric positioning accuracy of satellite images include attitude measurement errors, orbit measurement errors.Currently, the main aspects of improving the positioning accuracy of uncontrolled images are focused on the following two aspects: first, increasing the frequency of on-orbit geometric calibration and conducting joint geometric calibration at multiple calibration sites; second, conducting multi-source imaging analysis and data fusion to improve the accuracy of image.The hybrid block adjustment can reduce the positioning error [Tang X].
(1) High-frequency on-orbit geometric calibration Due to the influence of thermal environment changes and weightlessness during satellite orbiting, sensor parameters will change.Geometric accuracy of satellite images must be improved through on-orbit geometric calibration, which includes external calibration and internal calibration.As the satellite state and stability on board continuously change with time and orbital period, the closer the imaging time is to the geometric calibration time, the higher the geometric accuracy of the image product.Through theoretical analysis and long-term samples,Example dataBased on the statistical results, and in response to the demand for high-precision geometric positioning of global images, the frequency of on-orbit geometric calibration has been determined to be once every 20-40 days.The results show that compared to a calibration cycle of 2-3 months, the accuracy of the bias compensation matrix has been improved from 1.5" to better than 0.8", and the error of the inner orientation elements is less than 0.3 pixels.
(2) Multi-source image regional network adjustment assisted by generalized control data Through on-orbit calibration and other refinement processes, satellite image products have eliminated some systematic errors and various irregular distortions, but there are also inconsistencies in geometric accuracy between different regions and different orbital images.To address this issue, the project proposes a generalized control data-assisted large-scale stereo image regional adjustment technique to improve the overall geometric positioning accuracy of satellite images.From the perspective of engineering implementation, generalized control data includes high-precision public geographic information data, control points, and satellite laser altimetry data, and the geometric adjustment models involved mainly include rational function models (RFM) and satellite-borne laser altimetry error models.
After refinement processing such as calibration, satellite images are generally expressed using a rational function model (RFM) to represent the imaging geometry of the image.The error compensation of the RFM uses an image-side affine transformation compensation model, and the relationship between image point coordinates and ground point coordinates is: In the equation, represents the affine transformation parameter that compensates for the RFM system error of the image, and P is the rational function model parameter.
The on-board laser ranging system mainly records the distance of the detected targetdistance measurement valueThrough systematic analysis of the error sources, an error model for spaceborne laser altimetry can be established as follows: (2) Among them, which is the laser foot point coordinate, which is the scanning center position at the time of laser pulse emission, is the spatial-temporal attitude matrix of the laser pulse emission, is the laser scanning angle rotation matrix, is the scanning angle, To zero the scanning angle error, is the laser pulse ranging value, is the ranging error.

Typical intelligent interpretation methods for land cover
With the rapid development of artificial intelligence technology, intelligent methods represented by deep learning have been proven to be an efficient method for extracting land coverage features by automatically extracting image features of complex (1) terrain [AlhassanV, Vali A, Zhang Jixian].However, the automatic extraction of features worldwide faces problems such as scarce training samples from overseas, insufficient single data sources, and poor robustness of models across scenarios.In response, strategies such as constructing massive sample sets, multi-modal data fusion models, and continuous learning extraction mechanisms have been proposed.
(1) Constructing a massive sample set based on existing classification results Deep learning is still essentially a data-driven model, and a massive and diverse set of remote sensing image samples is the foundation for achieving high-precision intelligent extraction of typical elements of land cover on a global scale.Global or large-scale existing interpretation results provide rich datasets that can solve the current scarcity of high-quality samples, and can effectively mine potential knowledge or rules in sample labels.In this regard, this article proposes a method for constructing a massive sample set based on existing classification results, using basic geographic information data and extracting features through indices to obtain image-labelfeature information.At the same time, in order to ensure the accuracy of the samples, a sample set selection and iterative optimization strategy is proposed, which obtains the optimal sample dataset through manual interactive screening, model automatic iterative optimization samples, and other methods.The development of global land cover data products involves the massive information processing of tens of thousands of remote sensing images, but image defects caused by clouds, terrain shadows, hardware, etc. are common, and it is difficult to achieve global spatial and temporal continuous coverage of high-quality images with a single data source.To address this issue, this paper proposes a network model construction method for multimodal data fusion, which effectively utilizes the diversity of multimodal data in complex scenes to better meet the needs of global mapping.The deep network model uses a dual-branch convolutional neural network coupled with optical spectroscopy and SAR scattering features to achieve complementary relationships between multidimensional features.At the same time, for the differences in imaging conditions and time phases of multimodal data, a deep learning network model based on Self-attention is long The short-term memory network model automatically obtains the spatial and temporal context information of multi-source images to achieve accurate recognition of typical elements.

multi-source Data Mining technology
In the production process of vector data, especially in terms of obtaining attribute content, there are two main problems: first, mapping cannot collect field elements on a large scale, making it difficult to obtain element attributes; second, there are abundant internet resources, but most of them are not relevant to the difficult areas mapping.The quality is uneven, the content and form are different, and there are significant regional differences around the world.Therefore, this article proposes use of mass source data information fusion and mining technology can quickly extract information such as toponym and ground objects attributes.Combining open source data such as OSM, and authoritative websites in various countries, information fusion is performed using vector tile elements reconstruction models, multi-geometry factor road network matching algorithms, natural language processing, and other technologies to establish production-oriented workflows, thereby ensuring the integrity, accuracy, and richness of ground object attribute information.
(1) Different Source data multimodal mining In response to the problem of different forms and chaotic attribute structures of feature data obtained through multiple sources, a theoretical method and technical framework for extracting and integrating terrain feature data from overseas have been formed, and a global conversion model library and semantic information database have been established, providing technical support and data foundation for the acquisition of overseas terrain feature data.MapBox Automatically extract vector tile elements and annotations from online map services.
Figure 5 Multimodal Mining of Heterogeneous Data (2) Different Lots of source datafactor matching fusion A multi-factor matching method based on semantic information-assisted SM_HD distance, direction angle, and length ratio is proposed.vector of population sourceTypical elements are matched with the same entity to establish the relationship between elements, which improves the accuracy of matching and the accuracy of results.At the same time, based on the traceability mechanism, the matching method records and traces the preprocessing traces of each element in the benchmark dataset and the target dataset, which solves some errors such as discontinuity and deviation in the matching results of road network elements, thereby improving the integrity and accuracy of the fused road network data and preserving the complete road network structure.In addition, based on the normalized name, category, and location of multiindex geographical name address matching method, the geographical name address data are matched with the same entity to establish the relationship between elements.

Figure 6 Multifactor Matching and Fusion of Heterogeneous Data
(3) Accuracy verification and reliability analysis In response to the uneven geometric accuracy of crowd-sourced geographic information from the Internet, a method for automatically correcting the geometric accuracy of vector data based on remote sensing image feature recognition using Mask R-CNN is proposed to verify the geometric accuracy of crowdsourced geographic information.In addition, in terms of geometry, distance, directional angle, matching pair overlap and completeness are used as evaluation indicators for the geometric quality of vector data; in terms of attributes, field integrity, attribute classification information, and attribute value similarity are selected as evaluation indicators for the consistency of vector data attributes.

DEM intelligent editing technology with multimodal imaging
At present, the technology of producing DEM using optical stereo images is relatively mature.With the launch and application of domestic SAR satellites, the demand for radar interferometric measurement technology in rainy and cloudy areas mapping is increasing.Internationally The SAR satellite images processed mainly include RADARSAT, TerraSAR-X/TanDEM-X, etc., TanDEM-X system utilization InSAR technology has acquired global commercial terrain data (WorldDEM)[ HUESO], the main domestic SAR images derived from Tianhui-2(TH2) and L-SAR [Yu Bo, LOULiangsheng, LI Tao].As the application of domestic SAR images is just beginning, targeted research is needed in areas such as interferometric processing and hybrid DEM compilation to improve the utilization rate and production efficiency of satellite resources.
(1) Weak coherence classification unwrapping technology Due to the influence of noise or topographic geometry, interferometric images inevitably There is low in the coherent region, these low-quality regions not only Entanglement process The algorithm itself may have large errors, and may spread the errors to highly coherent regions, affecting the accuracy of unwrapping high-coherence points, thereby affecting the InSAR Therefore, this paper proposes a hierarchical network constrained adjustment (HNCA) based solution entanglement method This method uses a suitable method for high-quality primary points based on the interferometric phase quality map.two-dimensional unwrapping Method, for low-quality secondary point areas, uses the unwrapping results of highquality points as control points, and adopts a constrained network adjustment unwrapping method.This the constraints are this ensures the accuracy of unwrapping for high-quality points, while also improving the accuracy of unwrapping for low-quality points (Figure 7).(3) DEM intelligent editing technology In view of the complex and diverse terrain in the world, InSAR In the absence of a three-dimensional environment, DEM uses relational network learning to identify the depth features and feature classification capabilities of SAR images, achieving accurate recognition of small-sample SAR ground objects and automatically extracting buildings, vegetation, water systems, and other objects.For DEM editing in vegetation areas, using external data such as ICESAT-2, interpolating height correction surfaces to remove vegetation elevations; for DEM editing in building areas, using SAR grayscale images and DSM data, selecting ground points in the building area, and using multiclass interpolation and filtering methods to filter the height.For DEM editing of water bodies, classify water bodies according to their size and shape, and use external data to assist in constraints.Assign a uniform elevation to large areas of water such as oceans and lakes, and assign a dynamic top-down elevation change to flowing water such as rivers.(2) Launch crowd-sourced data crawling research and application of big data mining technology Draw on theories and methods from natural language processing, machine learning, and other fields to achieve ubiquitous internet geographic information.Network geographic information data cleaning, multi-source geographic information data fusion, and accuracy verification of important ground features, to ensure the integrity of data attribute information and the accuracy of vector data.The non-remote sensing information acquisition and mining of all factors from the aspects of ubiquitous geographic information and socio-economic aspects are carried out to realize the data mining of spatial-temporal big data and the release of thematic services.On the basis of the convergence of spatial-temporal big data, the hidden value of global geographic information resources is fully utilized to provide reliable and effective data-driven quantitative evaluation methods for the evaluation of the current situation of socio-economic development in various fields.
(3) Developing the application of artificial intelligence in the production of geographic information data Fully leveraging the new round of artificial intelligence technology, it is still necessary to further explore and study the practical effectiveness of remote sensing intelligent interpretation methods to be implemented as real remote sensing applications.From land coverage to core vector element mapping and other multi-scenario applications, it is necessary to build domain knowledge and remote sensing characteristic double driving strategy to adapt to the cross-scenario and multiscale requirements of various industry applications.Based on geological knowledge rules and remote sensing features, integrating relevant domain knowledge into deep learning models, focusing on the collaboration and integration of massive remote sensing data, establishing standardized, authoritative, unified sample libraries for different business application scenarios, which has important practical value for supporting subsequent technological development and rapid real-time monitoring.

Figure 3
Figure 3 Flowchart for Sample Set Construction (2) Construction of deep network for multimodal data fusion

Figure 7
Figure 7 Results of Different Unmixing Methods: (a1) the Nonhierarchical WLSMethod; (a2) One-Point WLS Method, Two-Point HNCA Method; (b1) mcf method; (b2) One-Point mcf Method, Two-Point HNCA Method; (c1)snaphu method of (c2) One-Pointsnaphu Method, Two-Point HNCA Method.The black elliptical area indicates regions with low coherence and severe noise (2) Multi-source terrain data fusion technology For the complex situation of multi-mode earth observation in global mapping, comprehensive utilization of InSAR Multibaseline/elevation railway track Fusion and optical/SAR production DSM product fusion technology are the key technologies to achieve multi-functionality.Source data Complementary advantages are an important method for improving the quality of DSM results.The registered multisource DSM is classified into overlapping and non-overlapping regions [25].Overlapping District data The integration adopts data optimization methods, which are based on multiple Source dataThe basis for quality evaluation, such as InSAR The DSM error is mainly caused by baseline error, followed by interferometric phase error, etc.We perform weighted average fusion based on these weights, and process the overlapping region DSM pixel by pixel to obtain a preliminary weighted fusion DSM for the overlapping region.The non-overlapping region data processing aims to maintain data consistency, and reconstruct areas with DSM holes based on smooth transition with the overlapping region and smooth data edges.

Figure 8
Figure 8 Intelligent DEM Data Acquisition and Compilation Technology Under the continuous learning mechanism of Model reuse and migrationDue to the highly heterogeneous characteristics of land cover types in global regions, a single classification model is difficult to work effectively, and model training and optimization need to be carried out according to different levels of heterogeneity and type characteristics.However, current deep neural network models are generally static models that cannot adapt or expand with the complexity of land cover heterogeneity.To address this issue, this paper proposes a continuous learning mechanism of Model reuse and migration.First, complete the current model training based on a specific regional sample set.When crossing regionsor scenarios, introduce a continuous learning strategy, When to be interpreted When there are a small number of labeled samples in the region, Will source domain samples Mixed with target domain samples to form a training sample set, continuously adjust the weights of each sample through the trained model to achieve continuous updating of deep learning model parameters under different regions, improve the generalization ability of the land cover classification model, and Save computing power resources.