HIGH RESOLUTION POLSAR IMAGE CLASSIFICATION BASED ON GENETIC ALGORITHM AND SUPPORT VECTOR MACHINE

This paper focuses on backscattering mechanisms selection and supervised classification works for CETC38-X PolSAR image. Thanks to the high radar resolution, many classes of man-made objects are visible in the images. So, land-use classification becomes a more meanful application using PolSAR image, but it involves the selection of classifiers and backscattering mechanisms. In this paper we apply SVM as the classifier and GA as the features selection method. Finally, after we find the best parameters and the suitable polarimetric information, the overall accuracy is up to 97.49%. The result shows SVM is an effective algorithm compared to Wishart and BP classifiers.  E-mail address: pxli@whu.edu.cn


INTRODUCTION
Thanks to the high radar resolution, now many classes of man-made objects, as well as vegetation and ground, are visible in PolSAR images so that PolSAR image classification is more significant.But unfortunately, previous works have shown us the problems of SAR image processing are far from being solved by a gain in resolution.Obviously, the high resolution helps discriminate small objects, but it creates new problems (Tison, 2004).So there requires a high robustness classifier.In the previous research, SVM (Support Vector Machine) is a good algorithm for classification and regression which is based on SRM (Structural Risk Minimization).In the recent literatures, it has been used for many fields, for instance, matching of SAR images and optical images (Hui, 2004), SAR image target recognition (Xue, 2005).
As we know, the 4-D coherency T 4 matrix and the covariance C 4 matrix are proposed in order to describe distributed targets, which can reduce to 3-D matrices for the reciprocity constrains.From T 3 or C 3 , we can extract a lot of backscattering mechanism information based on polarimetric target decompositions.Intuitively, we try to obtain more helpful features to improve the classification accuracy, but the complicated relationships among them are always harmful.And thus we can first select the most useful backscattering features before classification.As a classical heuristic algorithm, GA (Genetic Algorithm) is routinely used to generate solutions to optimization and search problems (GoldBerg, 1989).In this paper, we simply utilize this algorithm to select features to enhance the subsequence classification accuracy.

Support Vector Machine
In machine learning, SVM, also named support vector nerworks (Cortes, 1995), are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis.( 1, 2, , ) The linear SVM classification approach consists of looking for a separation between the two classes in X by means of an optimal hyperplane that maximizes the separating margin.In the nonlinear case, which is the most commonly used as data are often linearly nonseparable, they are first mapped with a kernel method in a higher dimensional feature space the hyperplane in the transformed space and is defined as The optimal hyperplane defined by the weight vector * ' d w  and the bias * b is the one that minimizes a cost function that expresses a combination of two criteria,namely: 1) margin maximization and 2) error minimization.It is expressed as This cost function minimization is subject to the constrains where i  are the so-called slack variables introduced to account for nonseparable data.The contant C represents a regularization parameter that allows to control the shape of the discriminant function and, consequently, the decision boundary when data are nonseparable.The above optimization problem can be reformulated as under the constraints 0, 1, 2, , The final result is a discriminant function conveniently expressed as ** ( ) ( , ) where ( , ) K is a kernel function.The set S is a subset of the indices {1, 2, , } N corresponding to the nonzero Lagrange multipliers i  , which define the so-called support vectors.The kernel function we use is the Gaussian function 2 ( , ) exp( where  represents a parameter inversely proportional to the width of the Gaussian kernel.
The basic SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the output, making it a binary classifier.To apply on multi-classification, different multi-classification strategies can be adopted.Here we adopt one-against-one voting strategy.

Genetic Algorithm
GA is a search heuristic that mimics the process of natural evolution in the computer science field of artificial intelligence.It finds applications in a large of fields, such as engineering, chemistry, bioinformatics.
There have three main operations: 1) selection operation selects better individuals to keep down based on their fitness, 2) crossover operation recombines individuals according to a certain probability, 3) mutation operation keep the diversity of population, but its possibility must not be set too large.
In a genetic algorithm, a population of individuals to an optimization problem is evolved towards better solutions.Each individual has a set of chromosomes which can be mutated and altered.At first, individuals were represented only in binary code, later other encodings appeared.In this paper, we use binary code, so 0 and 1 stand for the existence of a corresponding feature.
The evolution starts from a population of randomly generated individuals and later updates every generation, namely is an iterative process.In each generation, the fitness of all individuals is evaluated; the fitness is usually the value of the objective function in the optimization problem, but here is the overall accuracy.The more fit individuals are more possible to preserve to next generation, and according to a certain probability, individuals are also probable to be recombined and randomly mutated, so finally there form a new generation.Until either the maximum number of generations or a satisfactory fitness level is achieved, the evolution stops.The best individual in the last generation is just what we want.

Experimental Data
In this paper, there is a Paddyland data set.The CETC38-X Paddyland data set is acquired by the X-band Dual-antenna PolInSAR system in Linshui City of Hainan Province.The spatial resolution is about 0.

Cross Validation
Here SVM is used to achieve multi-classification. 30 samples of each class are randomly got as a training set.Applying SVM and Gaussian function, there are two parameters need to set: Gauss kernel  and penalty factor C .Setting them different values, the classification accuracy probably changes dramatically, which shows in Fig. 3 (a)-(c).We use the Cross Validation (CV) algorithm to obtain the best  and C .The final experiment results show the best  and C are 0.03125 and 445.7219, and the corresponding accuracy is 93.17% in Fig. 3 (d).

Features Selection
After finding  and C , the next issue is multi-backscattering mechanisms selection.We utilize GA to select backscattering features.Coded binary form makes each gene in chromosomes stands for the existence of a feature.The overall classification accuracy is treated as fitness.The number of individuals and evolution generations are 30 and 100.In Fig. 3 (e) and (f), it shows that the suitable features improve the classification accuracy to 97.49%.TABLE 4 lists the features which we use in the beginning and which we select later.

Contrast Results
In this section, we compare SVM classifier with another two classifiers: Wishart and BPNN (back-propagation neural network), the classification results are in Fig. 5 and TABLE 6.
In the previous research, literature (Lee, 2009)   Though the coherency T 3 matrix and the covariance C 3 matrix include a large number of polarimetric information, the blend information is useless to analysis targets precisely.So we need to extract different kinds of information based on polarimetric target decompositions.But how to effectively make use of these features, it is still an unresolved problem.
This paper introduces GA and SVM to high resolution PolSAR image classification.Firstly, we expound the theory of SVM and GA.Then, we verify the effectiveness of this method with the contrast of three classifiers using the CETC38-X Paddyland data set: the Wishart classifier directly use the C 3 or T 3 matrix, but can not use additional polarimetric information, and thus its overall accuracy is only 74.85%; contrary to Wishart, the BPNN classifier can apply additional information, but theoretically it is probable to fall into local minimum, so its accuracy is not good yet; finally, the SVM classifier gets the best result, as a result of its SRM theory and the use of additional information, the accuracy is up to 97.49%.Experimentally, it is obvious that SVM is a rather effective algorithm to classify high resolution PolSAR images.In future, we will focus on improving the classification capability via integration algorithm.
For simplicity, let us first consider a supervised binary classification problem.Let us assume that the training set consists of N vectors 5 meter in range and azimuth International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-7/W1, 3rd ISPRS IWIDF 2013, 20 -22 August 2013, Antu, Jilin Province, PR China direction.The original image size is 2048x2048 pixels and the 5x5 pixels multi-look process is given to reduce the speckle noise, in Fig.2 (a); in Fig.2 (b), there are total five classes, respectively corresponding to paddy in different growth stages.PolSAR image of Paddyland in Linshui City, (a) Pauli based on PolSAR image: red for |HH-VV|, green for 2|HV|, and blue for |HH+VV|.(b) Ground truth region of total five classes.

Figure 3 .
Figure 3.The influences of different combinations of  and C ,

Figure 5 .
derived the complex Wishart measurement to classify PolSAR images.But the Wishart measurement does not work well in very high resolution SAR images.The supervised Wishart classification result is obtained in Fig.4 (c) and the overall classification accuracy is 74.85%.Obviously, the Wishart classifier utilizes the covariance matrix and coherence matrix directly, but the physical decomposition results are omitted.On the other hand, the results display that the BP classifier also performances not very well in this data set.Among the three, only the overall accuracy from SVM classifier is good enough.Contrast of three classifiers, (a) SVM classifier, accuracy is 97.49%, kappa is 0.9641.(b) BPNN classifier, accuracy is 70.12%, kappa is 0.5887.(c) Wishart classifier, accuracy is International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-7/W1, 3rd ISPRS IWIDF 2013, 20 -22 August 2013, Antu, Jilin Province, PR China technology, we can observe many man-made objects in PolSAR images in recent years, and this brings PolSAR image classification tasks.
(b) TABLE 6. Confusion matrix of SVM and BPNN, (a) Confusion matrix of SVM.(b) Confusion matrix of BPNN.4.CONCLUSIONSThanks to the development of SAR sensors and imaging