RESEARCH ON SELF-CROSS-TRANSFORMER MODEL OF POINT CLOUD CHANGE DETECTION

: With the vigorous development of the urban construction industry, engineering deformation or changes often occur during the construction process. To combat this phenomenon, it is necessary to detect changes in order to detect construction loopholes in time, ensure the integrity of the project and reduce labor costs. Or the inconvenience and injuriousness of the road. In the study of change detection in 3D point clouds, researchers have published various research methods on 3D point clouds. Directly based on but mostly based on traditional threshold distance methods (C2C, M3C2, M3C2-EP), and some are to convert 3D point clouds into DSM, which loses a lot of original information. Although deep learning is used in remote sensing methods, in terms of change detection of 3D point clouds, it is more converted into two-dimensional patches, and neural networks are rarely applied directly. We prefer that the network is given at the level of pixels or points. Variety. Therefore, in this article, our network builds a network for 3D point cloud change detection, and proposes a new module Cross transformer suitable for change detection. Simultaneously simulate tunneling data for change detection, and do test experiments with our network.


INTRODUCTION
With the vigorous development of the construction industry, the engineering problems in the detection of construction are particularly significant, and we need to detect changes in different time periods in engineering applications.While many existing studies use 2D images for change detection, 3D point clouds bring some complementary information about height, which seems to be useful in the context of detecting construction ground aspects, since the main modifications occur on the height axis.Furthermore, spectral variability of the same object over time, differences in viewing angle between 2D image datasets plus simulated datasets (capable of introducing inconstruction changes).We then compare representative approaches from the state of the art, ranging from classical distance-based approaches to recent deep learning developments, in the context of aerial lidar surveys (ALS) in urban areas.

RELATED WORKS
In this section, we briefly review general methods for change detection in 3D point clouds.Existing 3D PC-based methods for urban environment change detection and characterization are reviewed.Although there are many ways to convert PCs to DSMs, we will not focus on these studies as they are not directly related to the scope of our paper.
Unlike 2D images organized in a regular pixel grid (2D grid), 3D point clouds generated by lidar are disordered and irregular, which makes it difficult to extract information from these data, and the time stamps between Comparing is even more difficult.
In fact, the location and distribution of points can also be very different in unaltered regions.Therefore, some methods convert a 3D point cloud into a 2D matrix that provides elevation information in each pixel.These 2D grids are called digital surface models (DSMs).
The idea of this method to detect changes between two 3D point clouds is to compute the DSM of the two point clouds and directly subtract them to retrieve the difference.It was first used for architectural change extraction in Murakami et al. (1999).It is still frequently used due to its simplicity and quality of results, and this method is also commonly used in the Earth observation community (Okyay et al., 2019); the difference in results can also be segmented using the Otsu threshold algorithm, which is calculated by minimizing each Variance between categories, thresholded from histogram of values (Otsu, 1979); since DSM contains artifacts (e.g.due to interpolation in hidden parts or difficulty retrieving precise architectural boundaries) (Gharibbafghi et al., 2019) ; There are several ways to apply more complex pipelines to derive more precise and finer-grained changes than positive or negative changes.after empirical thresholding of DSMd results, it can also be based on the size of the remaining pixel clusters, Height and shape are used to select 3D building variations (Dini et al., 2012); from 3D point clouds, DSM and digital terrain models (DTM) can be extracted relying on ground points.Teo (Teo et al., 2012) et al. retrieve and classify each object at each date using DSM difference and DTM, then the segmented objects can be compared between the two time periods to determine changes; Pang et al. (Pang et al., 2014) still extract architectural change candidates with DSM change thresholds based on DSM changes; DSM differences with basic thresholds or further refinements are still widely used for 3D urban monitoring (Warth et al., 2019) and post-disaster buildings In the change detection literature for damage assessment (Wang et al., 2020).
With the rise of deep learning methods in Earth observation, change detection in 2D imagery has also benefited from this advance.For example, applying a convolutional neural network (CNN) to RGB images can assess building damage due to earthquakes (Kalantar et al., 2020); this study compared three different architectures, and the best results were Siamese architecture (Siamese networks), Siamese networks are used for change detection or similarity computation between two inputs, thus, they have been heavily used in remote sensing applications (Shi et al., 2020); even in optical and synthetic aperture radar They can also provide reliable results in the case of non-uniform inputs such as (SAR) images (Mou et al., 2017); since (Zhang et al., 2019)  negative changes, which is not possible in C2C methods and, more importantly, requires less computational effort (Shirowzhan et al., 2019).There are also semantic-based approaches.Among them, Awrangjeb et al. (2015) first extract boundaries from lidar data and aerial images, extract 2D footprints of buildings, and then compare the footprints to highlight changes on 2D maps; (Xu et al., 2015)

Background
To address the problem of change detection and representation in 2D images, recent studies propose to use deep Siamese fully convolutional networks.It is contained in a common encoderdecoder network with skip connections.To extract features, both images will pass through the encoder part, which consists of two branches of the two images.Each branch is a series of traditional convolution and pooling layers to extract data information at several scales.The particularity of the Siamese network is that at each step of pooling, the difference of the extracted features of the two branches is maintained and cascaded at the corresponding scale in the decoder part (Daudt et al., 2018).If the data are very similar, the two branches of the encoder part can have shared weights to extract features in the same way.
When the data are significantly different, for example, if images come from two types of sensors (such as optical and radar sensors), the weights may be independent, resulting in a socalled pseudo-Siamese network (Zhan et al., 2017). To

Our framework
To extend the Siamese principle to 3D point clouds, we here propose to embed a modified Point transformer architecture into a deep Siamese network, where point clouds from two different time periods will pass the same encoding with shared weights device.Similar to common encoder-decoders with skip connections, at each scale in the decoding part we concatenate the differences in extracted features associated with the corresponding encoding scale (Figure 1).In practice, the computation of such feature differences is not noticeable, since point clouds do not contain the same number of points and are not defined at the same locations, even in unchanged regions.We In this section, we refer to (Wang et al., 2019)   () , …    () are the points closest to   () , in other words, the architecture learns how to build the graph used in each layer G, rather than having it as a fixed constant constructed before evaluating the network.In the process of implementation, we calculate a paired distance matrix in the feature space, and then take the closest k points for each point, instead of calculating the surrounding adjacent points according to the fixed distance according to the coordinates, which is more conducive to the proximity of the distance into semantic proximity.
In this section, we design a self-transformer based feature encoding module to process point clouds and effectively extract 3D features of scene point clouds.The Self transformer encoderdecoder module consists of an encoder and a decoder (Figure 1).
We adopt a twin structure in the encoder, and the two branches share the encoder with four layers of SA layers and Self transformer layers.Our network is upsampled and downsampled using PointNet++ (Qi et al., 2017).At the same time, position coding is added.
This paper uses a vector attention operator different from scalar attention, which can be verified in (Zhao et al., 2021).In the paper (Zhao et al., 2021), the authors used scalar attention.
Compared with scalar attention, the calculation of attention weight in vector attention is different.Vector attention can adjust the attention between each feature channel.We use a subtractive  In our tunnel change detection, we divide the test tunnel into two data for easy observation.In Figure 3 in the figure below, in Table 2 we counted the detection results, and showed good results in both data sets 1 and 2. In Figure 3, we compared with the real value, although the internal facilities of the tunnel are complex , but most of them can still be detected, such as the slight changes in the increase of wire changes under the water pipe on the right side of the tunnel wall in test1 can still be detected.we study the change in elevation, which is similar to the urban change detection (de Gélis et al., 2023), and in more complex tunnels, our network can still detect most of the changes, but for some changes, because the structure is too complex and similar, the changes that are relatively close to the tunnel wall have not been detected.It is worthy of our attention, and it is also the room for further improvement of the network.

CONCLUSION
In For example, Choi et al. (2009) et al. use DSM difference to identify change regions, and then segment each change region by filtering and grouping; aims to use airborne (ALS) and photogrammetry to generate The dual-temporal multimodal 3D information of the point cloud, they chose the Siamese architecture so that it is fed into one branch DSM (either directly into the DSM difference or into two channels), and in the other branch into the corresponding RGB quadrature image, in their study they also calculated the region of change using only the DSM information.Another 3D change detection method relies directly on raw point clouds.First, Girardeau-Montaut et al. (2005) proposed a cloudto-cloud (C2C) comparison based on Hausdorff point-to-point distances and octree subdivisions of PCs to speed up computation; Lague et al. (2013) developed a more refined method for measuring the average surface variation along the normal.Extract surface normals and orientations at a consistent scale based on local surface roughness.This approach is called Multiscale Model-to-Model Cloud Comparison (M3C2); the second technique allows distinguishing between positive and also propose to segment each point cloud to extract buildings, and then create a 3D surface disparity map by computing the point-to-plane distance between points in the first set and the closest plane in the second set.While rasterizing 3D data into a 2D elevation matrix (known as a digital surface model (DSM)) can be considered a valuable solution, this rasterization process results in some loss of information because only the The highest point, making it difficult to process such unstructured point cloud data directly with standard tools for 2D images.While existing studies that directly deal with 3D point clouds rely on hand-crafted features or distance computations, very few deep learning models are used to directly deal with the change detection representation problem of raw 3D point clouds.In recent years, deep learning methods have achieved good results in remote sensing and other fields.Therefore, designing a high-precision deep network model that can directly process 3D point clouds can provide a new method for point cloud change detection.This leads us to the next chapter.
address the 3D part of the problem, we propose to rely on deep networks capable of performing semantic segmentation directly on the PC.To this end, we consider the core module of the recent Point transformer (Zhao et al., 2021), a network that achieves very good results on segmentation and classification tasks.In a neural network similar to a 2D image encoder, the principle is to apply an attention mechanism, adapting this operation to a 3D point cloud.The author of Point transformer implemented different types of networks inspired by traditional networks in natural language processing and 2D image transformers, applied them to point clouds, and added the position codes that point clouds themselves have.At the same time, we have made further improvements to the Point transformer, and proposed a new module Cross-Transformer.This cross-attention mechanism realizes attention calculations across point clouds and is more suitable for change detection tasks.
Figure 1.Our network framework diagram

Figure 3 .
Figure 3.The two pictures on the left are the true values of the two data sets, and the right is the model detection results (red is the change, the two pictures on the top are test1, and the two pictures on the bottom are test2) this article, we propose a new point cloud change detection method, using a deep neural network, combined with a dynamic graph convolutional network and a Transformer model, to build a change detection network and extract descriptors with strong discriminative power.At the same time, based on the change detection task, this paper designs a new network module Cross transformer, and proposes a cross-attention mechanism, which can calculate the attention score between point clouds, improve the local fusion performance of the two point clouds, and more accurately find change area.We apply deep learning to the field of 3D point cloud change detection, but the label production of the data set still relies on manual work, and the results are no longer presented in the form of patches, and point-level change detection has been realized.Our future work will be more Focus on the model optimization innovation of 3D point cloud, and hope that more scholars will do and provide more data sets on the task of 3D point cloud change detection.

Table 1 .
Table 1 we can see that the accuracy metrics are better.Change detecter results of mine dataset

Table 2 .
Change detecter results of tunnel dataset