Dimensionality Reduction of Hyperspectral Images by Combination of Non-parametric Weighted Feature Extraction (nwfe) and Modified Neighborhood Preserving Embedding (npe)

This paper combine two conventional feature extraction methods (NWFE&NPE) in a novel framework and present a new semi-supervised feature extraction method called Adjusted Semi supervised Discriminant Analysis (ASEDA). The advantage of this method is dominating the Hughes phenomena, automatic selection of unlabelled pixels, extraction of more than L-1(L: number of classes) features and avoidance of singularity or near singularity of within-class scatter matrix. Experimental results on well-known hyperspectral dataset demonstrate that compared to conventional extraction algorithms the overall accuracy of the classification increased. 1. INTRODUCTION High-dimensional data appear in many applications of classification, data mining and machine learning. Hyperspectral images consist of hundreds of spectral bands that provide effective means for discrimination of subtly different phenomena for earth observation (Schott, 2007). Evidently, the curse of dimensionality problem arises when the number of bands in hyperspectral increases (Jun and Ghosh, 2011). Feature extraction is commonly applied as a preprocessing step to overcome the curse of dimensionality .Much work has been carried out in the literature to overcome this issue. The main approaches can be categorized in three groups:i)Regularization of the sample covariance matrix ii)Adaptive statistics estimation by the exploitation of the classifier (semi labeled) samples iii) Preprocessing techniques based on feature selection/extraction, aimed at reducing/transforming the original feature space into another space of a lower dimensionality (Jimenez and Landgrebe, 1998). One of the main approaches to overcome the curse of dimensionality is based on feature extraction, aimed to transforming the original feature space into subspace with a lower dimension (Kuo and Landgrebe, 2001). In the last decade, many methods have been proposed by various researchers that can be categorized into the supervised and unsupervised methods (He et al., 2005). One of the most popular unsupervised methods is Principle Component Analysis (PCA) which is not preserve properties of local neighborhood of classes, so it is commonly used for visual interpretation (He et al., 2005). Neighborhood Preserving Embedding (NPE) is another unsupervised method can preserve local neighborhood information and overcome to over fitting problems of supervised methods (Liao


INTRODUCTION
High-dimensional data appear in many applications of classification, data mining and machine learning.Hyperspectral images consist of hundreds of spectral bands that provide effective means for discrimination of subtly different phenomena for earth observation (Schott, 2007).Evidently, the curse of dimensionality problem arises when the number of bands in hyperspectral increases (Jun and Ghosh, 2011).Feature extraction is commonly applied as a preprocessing step to overcome the curse of dimensionality .Much work has been carried out in the literature to overcome this issue.The main approaches can be categorized in three groups:i)Regularization of the sample covariance matrix ii)Adaptive statistics estimation by the exploitation of the classifier (semi labeled) samples iii) Preprocessing techniques based on feature selection/extraction, aimed at reducing/transforming the original feature space into another space of a lower dimensionality (Jimenez and Landgrebe, 1998).One of the main approaches to overcome the curse of dimensionality is based on feature extraction, aimed to transforming the original feature space into subspace with a lower dimension (Kuo and Landgrebe, 2001).In the last decade, many methods have been proposed by various researchers that can be categorized into the supervised and unsupervised methods (He et al., 2005).One of the most popular unsupervised methods is Principle Component Analysis (PCA) which is not preserve properties of local neighborhood of classes, so it is commonly used for visual interpretation (He et al., 2005).Neighborhood Preserving Embedding (NPE) is another unsupervised method can preserve local neighborhood information and overcome to over fitting problems of supervised methods (Liao et al, 2011).The weakness of the unsupervised methods is random selection of pixels that in this paper we have tried to solve this problem.One the other hand, supervised methods maximize the class discrimination of the data (Chang, 2000).The well-known supervised method is Linear Discriminant Analysis (LDA) aimed to maximize between-class to within-class scatter matrices ratio (Fukunaga, 1990).Many studies focus on definition of betweenclass and within-class scatter matrices in parametric and nonparametric fashion.Bor Chen Kuo and David Landgereb proposed NWFE method to achieve full rank between class matrixes and extract greater features (Kuo, 2004).Jinghua Wang et al. prove the orthogonal feature to avoid singularity or near singularity of within class matrix.Regularized techniques in (Tatyana, 2009) have been proposed to solve singularity of within class matrix.In most supervised methods neighborhood information is not be considered.
In this paper, we proposed a method called Adjusted Semi supervised Discriminant Analysis (ASEDA) in order to preserve local information and maximize the distance between the classes.Indeed, combination of scatter matrix derived from NWFE and NPE algorithms caused that weakness of unsupervised and supervised methods are eliminate.Particular examination on a benchmark hyperspectral data set demonstrates an improvement of classification accuracy using ASEDA compared to conventional feature extraction methods.

METHODOLOGY
Let {xi} i=1,2,…, N, xi  R d denote high dimensional data , {yi} i=1,2,…, N , yi R r denote low dimensional data where r << d.In our application, d is the number of spectral bands of hyperspectral images, and r is the dimensionality of the projected subspace.The assumption is that there exists a mapping function f which can map every original data point xi to yi = f(xi) such that most information of the high dimensional data is kept in a much lower dimensional projected subspace (Alipour, 2012).

NWFE
Supervised feature extraction approaches computes an optimal transformation (projection) by minimizing the within-class distance and maximizing the between-class distance simultaneously, thus achieving maximum class discrimination.The optimal transformation can be readily computed by applying an Eigen decomposition on the so-called scatter matrices.The NWFE obtains a transformation matrix, W, such that maximizes the between-class scatter matrix Sb and minimizing the within-class scatter matrix Sw.

T ii y W x 
(1) , subject to maximize Where Sb and Sw are computed using training data, Mi is the number of samples in class i and C is number of classes.
In NWFE, scatter matrices, called the within-class, betweenclass and total scatter matrices are defined as follows (Kuo, 2004): Bor chen kuo used regularized equation ( 5) to overcome singularity or near singularity of within class scatter matrix.

NPE
Neighborhood Preserving Embedding is a linear approximation to locally linear embedding feature extraction.The algorithmic procedure is formally stated below: Step 1: Select neighborhoods In this step i-th node corresponds to the data point xi.In the standard NPE methods there are two ways to construct the adjacency graph:  K nearest neighbors (KNN): Put a directed edge from node i to j if xj is among the K nearest neighbors of xi.
 e neighborhood: Put an edge between nodes i and j if dist(xi , xj) ≤ e Selection of pixels in this step is random an it caused to get different result during computation of overall accuracy in classification map.We proposed to spectral and spatial criterion for selecting unlabeled pixels using training data.Spectral angle mapper (De in Equation ( 6)) is an insensitive to noise criterion that we used to select unlabeled pixels and negiborhood of training pixels for spatial criterion (Da in Equation ( 7)).
In Equation ( 6) X is unlabeled pixels and µi is mean training sample corresponds to the i-th class.
In equation ( 7) (ic,jc) coordinate of labeled pixels in image and (ni,nj) arbitrary coordinate of pixels can be candidate.

Step 2: Computing the weights
The weights on the edges can be computed by minimizing the following objective function,

Step 3: Computing the Projections
In this step, we compute the linear projections.Generally projection matrix in locally linear feature extraction method can be writing as: For NPE [13], ¯ = I and _ = (I-Q)' (I-Q).

COMBINATION OF NWFE AND NPE IN NATURAL FRAMEWORK
If all data X partitioned to two section X=[Xlabeled,Xunlabeled], in our method, mapping matrix is defined as follow: Where: (3) (4) (5) (6)

And
To obtain the projection matrix, we solve the generalized eigenvalue problem of the proposed ASEDA method, which is equivalent to: Through its nonlinear combination of supervised and unsupervised components, the proposed ASEDA seeks a projection direction on which the local neighborhood information of the data can be best preserved, while simultaneously the class discrimination is maximal.

EXPERIMENTAL RESULTS
To evaluate the proposed method a well known hyperspectral dataset is used.One is Indian Pine data set that is a sub-image of AVIRIS data with the size of 145×145 pixels that was taken over the northwest Indiana's Pine test site in June 1992 and has 16 classes.The data has 220 spectral bands with a spatial resolution of 20 m.The 20 water absorption channels and 15 noisy channels were also removed, resulting in a total of 185 channels.The calibrated data are available online (along with detailed groundtruth information) from http://cobweb.ecn.purdue.edu/˜biehl/.
Another data set is DC mall.DC Mall data set was collected with an airborne sensor system over the Washington DC Mall, with 1280×307 pixels and 210 spectral bands in the 0.4-2.4µm region.This data set consists of 191 spectral Bands after elimination of water absorption and noisy bands and is available at http://cobweb.ecn.purdue.edu/˜biehl/.An experiment has been designed in a way that the performance of ASEDA is compared to its counterpart and conventional feature extraction methods.To evaluate the performance of the ASEDA the Overall Classification Accuracy (OCA) of SVM, QDC and 1-NN over examined dataset has been used.In recent years techniques such as Principle Component Analysis (PCA), Classical LDA (Tatyana, 2009), NWFE (Kuo, 2004) Semi-supervised discriminant analysis (SDA) (Cai, 2008), Semi-Supervised Local Fisher Discriminant SELF (Liao, 2011) and Semi-Supervised Local Discriminant Analysis (SELD) (Liao, 2011) proposed.
Table III shows the produced OCA of the LDC,QDC and K-NN classifiers using PCA, LDA, NWFE, SDA, SELF and SEELD over Indian Pine dataset and DC dataset.

Figure
Figure 1: A color composite of Indian Pine dataset

Table I ,
II.TABLE I NUMBER OF TRAINING AND TEST SAMPLES FOR EACH CLASS OF THE INDIANA PINE DATA SETS.

TABLE II NUMBER
OF TRAINING AND TEST SAMPLES FOR EACH CLASS OF THE DC Implementation of the proposed method on examined dataset lead to extraction of features that mainly are discriminative, because of degeneracy of Sw -1 Sb was almost declined.The superior result obtained by ASEDA using 1-NN classifier in both dataset.The authors would like to thank Prof. Landgrebe for providing the AVIRIS Indian Pines and ground truth image.