Semi-supervised PolSAR Image Change Detection using Similarity Matching

The lack of precisely labeled data limits the development of supervised polarimetric synthetic aperture radar (PolSAR) image change detection. Therefore, semi-supervised deep learning methods have recently demonstrated their significant capability for PolSAR image change detection. Similarity Matching (SimMatch) improves the performance of semi-supervised learning tasks across different benchmark datasets and different settings. Introducing SimMatch into the field of PolSAR image change detection can improve the performance of semi-supervised PolSAR image change detection under limited labeled data conditions. Usually, semi-supervision solves the problem of insufficient labeled data by generating pseudo-labels. However, when the pseudo-label method is simply applied, the model will fit on the confident but wrong pseudo-labels, resulting in poor performance. SimMatch offers a solution by requiring the strongly augmented view to share the same semantic similarity (i.e. label prediction) and instance characteristics (i.e. similarity between instances) with a weak augmented view for more intrinsic feature matching. Besides, by using a labeled memory buffer, the two similarities can be isomorphically transformed with each other by introducing the aggregating and unfolding techniques. Therefore, the semantic and instance pseudo-labels can be mutually propagated, and then, the detection performance of the PolSAR image change detection is improved. Experimental results on real PolSAR datasets demonstrated that SimMatch is an effective semi-supervised PolSAR change detection method and its performance surpasses some well-known change detection methods. Compared to the fully-supervised algorithm CWNN, the semi-supervised SimMatch algorithm can improve accuracy by up to 14.4%.


Introduction
In the past several years, polarimetric synthetic aperture radar (PolSAR) data have become easily accessible due to more and more satellites launched for data collection (Zhang et al., 2021).Taking the Gaofen-3 satellite as an example, the resolution can reach 1 meter and the maximum width can reach 650 kilometers.Benefiting from artificial intelligence (AI), the methods for performing image change detection tasks on PolSAR data have advanced rapidly (Chen et al., 2020).
However, a large volume of labeled data is very expensive to collect in a real-world scenario.Learning with few labeled data has been a longstanding problem in the PolSAR image change detection research community.Among various methods, semisupervised learning (SSL) (Chapelle et al., 2006) has recently demonstrated its significant capability for PolSAR image change detection.
A simple but very effective semi-supervised learning method is to adopt the classical two-stage training paradigm: pretraining plus fine-tuning.Such as ResNet (He et al., 2016), Vision Transformer (ViT) (Dosovitskiy et al., 2020), and Swin-T (Liu et al., 2021) are commonly pretrained in a supervised manner on a large-scale dataset, and then transfer the learned representation by fine-tuning the pretrained model with a few labeled samples.
Instead of separate two-stage pretraining and fine-tuning, current popular methods directly involve the labeled data in a joint feature learning paradigm with pseudo labeling (Lee, 2013) or consistency regularization (Sajjadi et al., 2016).These methods train semantic classifiers with labeled samples and use predicted distributions as pseudo labels for unlabeled samples.Pseudo-labels typically come from weakly augmented views or the average predicted values of strongly augmented views.However, when there are only very limited labeled data, they suffer severe "overconfidence" issues, i.e., the model will fit on the confident but wrong pseudo-labels, resulting in poor performance.We notice that matching the similarity relationships of both semantic and instance levels simultaneously is extremely beneficial to decouple their predictions as well as alleviate overfitting on noisy pseudolabels.
In this paper, we introduce Similarity Matching (SimMatch) (Zheng et al., 2022) to improve the performance of Semisupervised PolSAR image change detection across different benchmark datasets and different settings.SimMatch, which has been shown in Figure 1, offers a solution by requiring the strongly augmented view to share the same semantic similarity (i.e.label prediction) and instance characteristics (i.e.similarity between instances) with a weak augmented view for more intrinsic feature matching.Besides, by using a labeled memory buffer, the two similarities can be isomorphically transformed with each other by introducing the aggregating and unfolding techniques.Therefore, the semantic and instance pseudo-labels can be mutually propagated, and then, the detection performance of the PolSAR image change is improved.We extended SimMatch from red-green-blue (RGB) to PolSAR image change detection, which is a novel semi-supervised learning framework that considers both semantic similarity and instance similarity, achieving better results than other methods.

Related Work
2.1 Semi-Supervised Learning.
Consistency regularization and entropy minimization methods are widely used in semi-supervised learning.Consistency regularization ensures that when perturbations are applied to the input or model, the model's response to the input remains consistent (Sajjadi et al., 2016).Regularization can be achieved through the simplest form of loss terms: A(x) is stochastic transformation, which can be domain-specific data augmentation (Sajjadi et al., 2016), random max pooling (Sajjadi et al., 2016), or adversarial transformation (Miyato et al., 2018).A further approach is to perturb the model p model ' , rather than the input.The perturbation can be adversarial perturbations to model parameters θ (Zhang and Qi, 2020), or a time ensembling for the model at different time steps (Tarvainen and Valpola, 2017).Also, entropy minimization utilizes unlabeled data in an explicit bootstrapping manner, assigning pseudo labels to unlabeled data for joint training with labeled data.Different from prior works, MixMatch (Berthelot, 2019), ReMixMatch (Berthelot et al., 2019), and FixMatch (Sohn et al., 2017) combine the advantages of these two methods and propose a hybrid framework to leverage unlabeled data from two perspectives.Specifically, MixMatch adopts sharpened average predictions from multiple strongly augmented views as pseudo labels and further enhances them by using the MixUp trick (Berthelot, 2019).ReMixMatch inherited this idea and proposed Augmentation Anchoring to generate pseudo labels with weakly augmented views.It also introduces a distribution alignment strategy that encourages pseudo-label distribution to match the edge distribution of ground-truth class labels (Berthelot, 2019).FixMatch simplified unnecessary mechanisms and achieved optimal performance by retaining only high-confidence pseudo labels and their corresponding unlabeled data.In recent work, FlexMatch (Zhang et al., 2021) utilized inherent learning states to filter low-confidence labels class-wise as a further extension.

The self-training via pseudo labeling.
Pseudo-labels are artificial labels generated by the model itself for further training.Lee (2013) selected the class with the highest prediction probability of the model as the pseudo label in his semi-supervised deep learning network.The low-density separation assumption requires that when minimizing the entropy on pseudo labels, the decision boundaries between clusters of unlabeled samples are in the low-density region.Filtering pseudo labels based on confidence threshold (Sohn et al., 2020) is a simple but effective extension, which defines the confidence of a pseudo label as the highest probability it is considered to be any class.Filter out pseudo labels with confidence levels below the threshold, allowing us to focus more on labels that are away from the decision boundaries but have high confidence (low entropy).The use of pseudo labels for self-training is an explicit classical method proposed more than a decade ago (Lee, 2013).
In recent years, it has received increasing attention from various fields, such as semi-supervised learning (Cascante-Bonilla et al., 2021), fully-supervised learning (Radosavovic et al., 2018), and domain adaptation (Kumar et al., 2020).Especially in semisupervised learning, it has been refocused in some computer vision tasks, such as image detection.

Method
This section first introduces the input data types of PolSAR images, and then provides a basic framework for semisupervised learning with augmentation anchoring to delve into the main components of SimMatch.

The input data of SimMatch
Under horizontal and vertical polarization bases (H, V) , PolSAR can obtain full polarization information of targets, which can be characterized by polarization scattering matrices: where H is the horizontally polarized emission or reception of electromagnetic wave signals, and V is the vertically polarized emission or reception of electromagnetic wave signals.
Under the assumption of reciprocity (S VH =S HV ) in a single station scenario, the commonly used polarization coherence matrix in polarimetric SAR information processing can be obtained: Figure 4. Overview of SimMatch pseudo label generation process.SimMatch first uses a semantic pseudo label and an instance pseudo label generated from weakly enhanced views to calculate the similarity between semantics and instances through the class center and tag embedding and then uses unfolding and aggregation operations to fuse these two similarities, ultimately obtaining the pseudo label.
In Eq. ( 3), k P is the Pauli scattering vector, and (⋅) H represents conjugate transposition.
The elements in the T are complex numbers except for the diagonal ones.The polarimetric information of each pixel can be defined by a vector t p . (5) where T ij are the elements in T .i, j = 1, 2, 3 .Re [] and Im [] denote the real part and imaginary part of T ij , respectively.
Typically, when classifying a pixel in a PolSAR image, the neighborhood window data of the pixel is also used as input, as shown in Figure 2. Let C denote the number of channels, also the number of elements in t p and H and W denote the height and width of the neighborhood window data.Then the size of the input data is (C, H, W).
The number of channels in a PolSAR image is 9.In order to detect change in the same pixel at different times, we overlay two PolSAR images with the same location but different phases, as shown in Figure 3.At this point, the number of channels for the input data is 18.

SimMatch semi-supervised learning framework
The semi-supervised image detection problem can be defined as follows.Randomly apply a weak augmentation function T w (⋅) to a batch of labeled data X= {x b : b∈ (1…, B)} to obtain weakly augmented samples.Then, use the encoder F (⋅) of the convolutional neural network to obtain feature information: h=f (T (x)), and map h b to semantic similarity p=ϕ (h) using the fully connected class prediction head ϕ (⋅).Similarly, applying weak augmented T w (⋅) and strong augmented T s (⋅) [5,6] to unlabeled data u = {u b :b∈ (1…, μB)} randomly, the same processing is performed to obtain the semantic similarity for weak augmented sample p w (pseudo label) and strong augmented sample p s .
SimMatch also considers instance similarity, which encourages strong augmented views and weak augmented views to have similar similarity distributions.Assuming a nonlinear projection head g (⋅) is mapped to a low dimensional embedding z b =g(h b ), based on augmented anchoring, z b w and z b s are used to represent the embeddings of weakly augmented views and strongly augmented views.For k weakly augmented embeddings of different samples {z k : k∈ (1..., K)} , the similarity function sim (⋅) is used to calculate the similarity between z w and the i -th instance.This similarity function represents the L 2 normalization vector sim (u, v) =u T v/ ||u|| ||v||.Process the obtained similarity in the softmax layer to obtain the similarity distribution, where  is the temperature parameter for adjusting the sharpness of the distribution: Similarly, calculating the similarity sim (z b s , z i ) between the strongly augmented views z S and z i yields a similarity distribution as follows:

Label Propagation through SimMatch
In the previous text, we considered instance-level consistency regularization, but due to the completely unsupervised instance pseudo-label q w , it caused a great waste of labeled information.SimMatch improves the quality of pseudo-labels by utilizing labeled information on the instance level and a strategy that allows for interaction between semantic similarity and instance similarity.
All annotated examples are saved in a labeled memory buffer, as shown in Figure 4 (green branch).In Eq. ( 6) and Eq. ( 7) each z k can be assigned to a specific class.The vector in ϕ is interpreted as a "centered" class reference and the embeddings in the labeled memory buffer can be considered as a set of "single" class references.
By using a weakly augmented sample, the semantic similarity p w ∈R 1×L and instance similarity q w ∈R 1×K can be calculated first.(Note that since each class requires at least one sample, L is usually much smaller than K.) We need to unfold p w into Kdimensional space (represented as p unfold ) to calibrate q w .By matching the corresponding semantic similarity embedded in each tag, this can be achieved: ,where class(q j w )=class(p i w ) , (8) Figure 5.The label propagation.When the semantic and instance similarity are similar, the resulting pseudo labels will be clearer and when these two similarity points are different, the generated pseudo labels will be flatter.
among them, class (•) is a function that returns the basic truth class.Specifically, class(q j w ) represents the label of the j th element in the memory buffer, and class(q j w ) represents the i th class.Then use p unfold to scale q w to regenerate the calibrated instance pseudo labels, which can be represented as: The pseudo label q ̂ of the calibrated instance will replace the old one q w .On the other hand, in order to use instance similarity to adjust semantic similarity, it is necessary to first aggregate q into the L dimensional space, represent it as q agg , and then achieve sharing the same ground-truth labels by summing instance similarity: The semantic pseudo labels readjusted by using q agg smoothing p w can be written as: among α it is a hyper-parameter that controls the weight and instance information of semantics.Similarly, the adjusted semantic pseudo labels will replace the old one p i w .In this way, pseudo labels p ̂ and q ̂ will contain both semantic and instancelevel information.As shown in Figure 5, if these two similarities are different, the result will be flatter and do not contain high probability values.On the other hand, when the similarity between semantics and instances is close, it means that these two distributions are consistent with each other's predictions, resulting in sharper results and high confidence for certain classes.

Efficient Memory Buffer
As mentioned earlier, SimMatch has a memory buffer to store feature embeddings and basic truth labels of labeled data.Specifically, SimMatch defines a feature memory buffer Q f and a label memory buffer Q l , where K is the number of labeled data and D is the embedding size.For Q l , only one scalar needs to be stored for each label, and we can achieve aggregation and unfolding operations through the gather and scatter add functions in the deep learning library (Zheng et al., 2022).
In SimMatch, two different implementations were adopted for different buffer sizes.When K is large, MoCo (He et al., 2020) is followed to utilize a student-teacher-based framework, represented as F s and F t .
In this case, the labeled data and strongly augmented data will be passed into F s , while weakly augmented data will be fed into F t to generate pseudo labels.
On the other hand, when K is small, a time integration strategy (French et al., 2017) is only needed to smooth the features in the memory buffer, which can be written as: In this case, all samples will be directly passed to the same encoder.

Loss
As can be seen from the previous text, the labeled samples are directly optimized through cross-entropy loss with groundtruth labels: Unsupervised loss can be defined by the cross entropy between the semantic similarity for weak augmented sample p w (pseudo label) and strong augmented sample p s .
next, we only retain the maximum class probability in the pseudo label that is greater than the confidence threshold τ unlabeled samples (Sohn et al., 2020).DA (•) represents a distribution alignment strategy (Berthelot et al., 2019), which balances the distribution of pseudo labels.We just need to maintain the moving-average of p avg w and use Normalize(p w /p avg w ) to adjust the current p w (Li et al., 2021).
Additionally, it should be noted that DA(p w ) will be used directly as a pseudo label instead of using the sharpened or hot version of p w .Besides, we achieve consistency regularization by minimizing the difference between q s and q w .Here, crossentropy loss is used: The overall training objective of our model is: 4. Experiments

Dataset
The dataset was generated by unmanned aerial vehicle synthetic aperture radar (UAVSAR) sensors capturing different regions of Los Angeles (LA), with each dataset consisting of two sets of images from April 23, 2009, andMay 11, 2015.Among them, LA1 has a height of 786 pixels and a width of 300 pixels, and LA2 has a height of 766 pixels and a width of 300 pixels.Figure 6 shows the Pauli pseudo-color maps and their ground-truth (GT) maps of LA1 and LA2 at different times.The experiment uses 100 labeled change pixels as the number of training samples for semi-supervised change detection.For each dataset, we randomly select the same number of training samples representing both changed and unchanged types, with the remaining pixels used for performance evaluation.The OA, Kappa coefficient, precision, recall, and F1 score were used for the change detection performance evaluation (Lee, 2013).We compare the well-known fully-supervised change detection method CWNN (Gao et al., 2019) with SimMatch: a SAR image sea ice change detection method based on a convolutional-wavelet neural network.

Results
The change detection results and performance evaluation on the UAVSAR LA1 dataset are shown in Figure 7 and Table 1 The CWNN achieved impressive results, but some unchanged areas were detected as changing areas, which affected the performance of change detection.The SimMatch algorithm proposed in this article achieved excellent performance on this dataset, with slightly better accuracy than CWNN even under the constraint of semi-supervised learning with small samples.Most of the changed and unchanged regions were correctly identified.The OA, Kappa coefficient, recall, and F1 scores are 2.18%, 0.107, 0.199, and 0.031 higher than those of CWNN, respectively.
Figure 7 and Table 2 show the change detection results of the LA2 dataset.The performance of CWNN is very low, with many unchanged areas identified as changed areas, while the changed area in the upper right corner is almost not recognized.
The performance of the proposed SimMatch is also much better than CWNN in semi-supervised situations.Compared with CWNN, OA has improved by 14.4%, the Kappa coefficient has increased by 0.603, the recall has increased by 0.492, and the F1 score has increased by 0.617.
To further explore the optimal performance of SimMatch, we conducted a fully-supervised experiment on the UAVSAR dataset using 5% of labeled samples.We use 3664 samples on the LA1 dataset and 2318 samples on the LA2 dataset.SimMatch achieved excellent results on both datasets.The OA has reached its highest level with 95.08% and 98.31% respectively.The experiment further validated the excellent performance of SimMatch under fully-supervised learning.
The above experiments and analysis indicate that the SimMatch proposed in this paper can achieve good performance in PolSAR image change detection.Even with a small number of training samples, the performance surpasses advanced fullysupervised change detection methods.This framework considers both semantic-level and instance-level consistency regularization and has memory buffers to fully utilize instancelevel data annotations, with strong feature extraction capabilities.

Conclusion
This paper introduces the new semi-supervised learning framework SimMatch into the field of SAR image change detection.In SimMatch, consistency regularization is applied simultaneously at the semantic-level and instance-level, associating semantic similarity with instance similarity.The two types of similarity can be propagated through unfolding and aggregation operations.At the same time, it also has a labeled memory buffer that can fully utilize the basic fact labels at the instance-level.Extensive experiments have shown that SimMatch can generate higher quality and more reliable matching targets with a small amount of labeled data.The change detection results on the UAVSAR dataset demonstrate the most advanced performance of semi-supervised learning.

Figure 1 .
Figure 1.Consider the Fully-Connected layer vector as the semantic representation or class center of each category.SimMatch uses labeled memory buffers to fully utilize instancelevel labels.

Figure 2 .
Figure 2. The input neighborhood window data of a pixel in a PolSAR image.C denotes the number of channels.H and W denote the height and width of the neighborhood window of a pixel.

Figure 3 .
Figure 3.The input data formed by overlapping data of different periods.

Table 1 .
The change detection performance evaluation on the UAVSAR LA1 detest.