COMPARISOM OF WAVELET-BASED AND HHT-BASED FEATURE EXTRACTION METHODS FOR HYPERSPECTRAL IMAGE CLASSIFICATION

Hyperspectral images, which contain rich and fine spectral information, can be used to identify surface objects and improve land use/cover classification accuracy. Due to the property of high dimensionality of hyperspectral data, traditional statistics-based classifiers cannot be directly used on such images with limited training samples. This problem is referred as “curse of dimensionality.” The commonly used method to solve this problem is dimensionality reduction, and feature extraction is used to reduce the dimensionality of hyperspectral images more frequently. There are two types of feature extraction methods. The first type is based on statistical property of data. The other type is based on time-frequency analysis. In this study, the time-frequency analysis methods are used to extract the features for hyperspectral image classification. Firstly, it has been proven that wavelet-based feature extraction provide an effective tool for spectral feature extraction. On the other hand, Hilbert-Huang transform (HHT), a relative new time-frequency analysis tool, has been widely used in nonlinear and nonstationary data analysis. In this study, wavelet transform and HHT are implemented on the hyperspectral data for physical spectral analysis. Therefore, we can get a small number of salient features, reduce the dimensionality of hyperspectral images and keep the accuracy of classification results. An AVIRIS data set is used to test the performance of the proposed HHT-based feature extraction methods; then, the results are compared with waveletbased feature extraction. According to the experiment results, HHT-based feature extraction methods are effective tools and the results are similar with wavelet-based feature extraction methods. * Corresponding author.


INTRODUCTION
Imaging spectrometer, a technology which was developed in 1980's, can obtain hundreds of spectral bands simultaneously (Goetz et al., 1985).The images acquired with spectrometers are called as hyperspectral images.These images not only reveal two-dimensional spatial information but also contain rich and fine spectral information.With these characteristics, they can be used to identify surface objects and improve land use/cover classification accuracies.In past three decades, hyperspectral images have been widely used in different fields such as mineral identification, vegetation mapping, and disaster investigation (Goetz et al., 1985).
Because hyperspectral data have the property of high dimensionality, image processing methods which have been effectively applied to multispectral data in the past are not as proper as to hyperspectral data.For instance, it is ineffective when the traditional statistical classification methods are applied to hyperspectral images with limited training samples.In other words, the dimensionality increases with the number of bands, the number of training samples for classification should be increased as well (Hsu, 2007).This has been termed the "curse of dimensionality" by Bellman (1961).The commonly used method to solve "curse of dimensionality" is dimensionality reduction, which can be divided into two types: feature selection and feature extraction.For hyperspectral images, feature extraction is used to reduce the dimensionality more frequently (Hsu, 2003).
There are two types of feature extraction methods.The first type is based on the statistical property of data.For instance, principal components transform (PCT) is the most commonly used and simple method.Although it concerns the distribution of whole data, some useful features for hyperspectral data will be neglected easily.Discriminant analysis feature extraction (DAFE) is to maximize the between-class scatter and minimize the within-class scatter.Moreover, decision boundary feature extraction (DBFE), which was proposed by Lee and Landgrebe (1993), could find useful features by decision boundaries between different classes.Although DAFE and DBFE are effective and practical algorithms, there are some disadvantages.For example, the maximum number of feature in DAFE is the number of class minus one.Besides, in order to get reliable parameters in DAFE or to compute the decision boundaries in DBFE, it still needs adequate training samples (Fukunaga, 1990;Lee and Landgrebe, 1993).
The other type of feature extraction methods is based on timefrequency analysis.For example, it has been proven that wavelet-based feature extraction provide an appropriate and effective tool for spectral feature extraction (Hsu, 2003).However, this method has some disadvantages; for instance, it has to select the wavelet basis function in advance, or it is not suitable for nonlinear data analysis.Hilbert-Huang transform (HHT) is a relatively new adaptive time-frequency analysis tool.It combines empirical mode decomposition (EMD) and Hilbert spectral analysis (HSA), and has been used extensively in nonlinear and nonstationary data analysis.In this study, the wavelet transform and HHT are implemented on the hyperspectral data for physically spectral analysis.The spectral features are then extracted based on the results of physically spectral analysis, so that we can get a small number of salient features, reduce the dimensionality of hyperspectral images and keep the accuracy of classification results.In our experiment, an AVIRIS data set is used to test the performance of the proposed HHT-based methods.Finally, the results are also compared with wavelet-based feature extraction methods.

HILBERT-HUANG TRANSFORM
Hilbert-Huang transform (HHT), first proposed by Huang et al. (1998), is a valid time-frequency analysis tool for nonlinear and nonstationary data.The HHT which consists of empirical mode decomposition (EMD) and Hilbert spectral analysis (HSA) will be described briefly in this section.

Empirical mode decomposition
Empirical mode decomposition (EMD) can decompose timeseries data into a series of intrinsic mode functions (IMFs) adaptively.These IMFs include different regions of frequency, and each IMF has two properties (Huang, 2005): 1.The number of extrema and the number of zero-crossing of an IMF must equal or differ at most by one.
2. All the local maxima and minima of an IMF are symmetric with respect to zero.
The EMD consists of the following steps: 1. First, identify all the local maxima and connect them by cubic spline function as the upper envelope for a signal, . Repeat the procedure for the local minima to generate the lower envelope.

Compute the mean 1
m of the upper and lower envelopes, and let The procedure which obtain IMF components is called sifting process.

Proto-IMF, 1
h , may not satisfy the definitions of IMF.Repeat the sifting process k times until the IMF meet the stoppage criteria.
4. As soon as the IMF component satisfy the criteria, we will get first IMF, 1 c , and separate 1 c from (3) 5. Since the residue, 1 r , still contains information with long periods, it is treated as the data and repeat the sifting process.The result is Finally, by Summing up equation ( 3) and ( 4), we obtain The EMD separates variations from the mean, and each IMF has its own physical meaning.

Hilbert Spectral Analysis
Having obtained the IMF components, we can apply the Hilbert transform to each IMF component and compute the instantaneous frequency.Then we can find the complex conjugate, , and have an analytic signal: is the function of instantaneous amplitude, and is the function of phase angle.As a consequence, we can express the original data as the real part, RP, in the following form: Therefore, the Hilbert spectrum can be defined as: We can also define the marginal Hilbert spectrum as: In summary, the HHT, consisting of EMD and HSA, can decompose data adaptively and compute instantaneous frequency by differentiation rather than convolution.HHT is a superior data analysis tool for nonlinear and nonstationary data.

Comparisons between Wavelet Transform and Hilbert-Huang Transform
Wavelet transform and Hilbert-Huang transform are both timefrequency analysis tools, so that they can analysis the variation of data in both time and frequency domain.Table 1 shows some differences between wavelet transform and HHT.Firstly, wavelet transform have complete theoretical base and have to define a basis function before using it; whereas, HHT with empirical theoretical base has an adaptive basis, which can analysis data adaptively.Second, wavelet transform computes frequency by convolution operation; while, the frequency is derived by differentiation rather than convolution in HHT.However, wavelet transform and HHT can present the results in time-frequency-energy space.Finally, wavelet transform is suitable for nonstationary data but is unsuitable for nonlinear data.On the contrary, HHT is suitable for both nonlinear and nonstationary data.Therefore, HHT is a superior tool for timefrequency analysis of nonlinear and nonstationary data (Huang, 2005).

Datasets Description
In this study, an AVIRIS data set is used to test the performance of using wavelet transform and HHT on hyperspectral image feature extraction and classification.The AVIRIS data set shown in Figure 1

Wavelet-Based Feature Extraction
The orthogonal wavelet transform can decompose a signal into the low-frequency components that represent the optimal approximation, and the high-frequency components that represent detailed information of the original signal (Mallat, 1989).The decomposition coefficients in a wavelet orthogonal basis can be computed with a fast algorithm that cascades discrete convolutions with conjugate mirror filters (CMF) h and g, and subsamples the outputs.The decomposition equations are described as following: a j is the approximation coefficients at scale 2 j , and a j+1 and d j+1 are respectively the approximation and detail components at scale 2 j+1 (Mallat, 1999).Because of sub-sampling, the length of the original signal is reduced after applying conjugate mirror filter to the original signal.With these properties, wavelet decomposition is implemented on dimensionality reduction of hyperspectral images (Hsu, 2003).
Linear and nonlinear wavelet-based feature extraction (WFE) methods are applied to hyperspectral images in this study.In linear WFE, the approximation coefficients a j are regarded as features for classification.On the other hand, nonlinear WFE consider approximation coefficient a j as well as detail coefficient d j , and select M largest wavelet coefficients as important features for classification (Hsu, 2003).In the experiments, the wavelet function used in linear and nonlinear WFE is Daubechies 3 wavelet (Daubechies wavelet with 3 vanishing moments).The experiment procedure of waveletbased feature extraction is illustrated in Figure 2.

HHT-Based Feature Extraction
According to the characteristics of time-frequency analysis of HHT, the HHT will be applied to spectral curves of each pixel in hyperspectral image.First of all, Hilbert-Huang transform is implemented on a spectral curve.The instantaneous frequency and amplitude of each component will be calculated.Then Hilbert spectrum is formed by using instantaneous frequency and amplitude.The residual information is also considered in this spectrum.After that, the M largest values in the Hilbert spectrum are selected as the important features of the spectral curve for classification.These features are sorted by the bands where the feature is located.If more than two features have same location of bands, sort the features according to their frequency.Finally, the extracted features are used as the inputs for classification.Maximum likelihood classifier is used in this study.The procedure of feature extraction using Hilbert-Huang transform is illustrated in Figure 2     Figure 4 shows the classification accuracies between WFE methods.First of all, the results of linear and nonlinear WFE methods are similar, and the best accuracies of linear WFE (level 3), linear WFE (level 4), linear WFE (level 5) and nonlinear WFE are 85%, 86.67%, 89.33 and 83.33%.In addition, the classification accuracies increase slightly in linear WFE, when the level of decomposition increases.

Experiment III: Comparison of Classification Results between Wavelet-Based and HHT-Based Feature Extraction Methods
In experiment III, the purpose is to compare the performance between wavelet-based and HHT-based methods.The classification accuracies with different methods are showed in Figure 5. First of all, the classification accuracies of WFE methods and HHT-based methods are all conformed to Hughes phenomenon that classification accuracy increases at first and then declines as the number of feature grows.
Compared with the results of different methods, linear and nonlinear WFE have similar classification results which have been metioned in section 4.2.Also, the results of unsupervised HHT-based method are similar to WFE methods but the accuracies decrease obviously when the number of feature is more than ten.Finally, supervised HHT-based feature extraction can achieve better classification results than any other methods.According the experiments, the results of unsupervised HHTbased methods are similar to the result of WFE which is implemented in this study, but the accuracies of unsupervised HHT-based method are unstable when the feature increases.Subsequently, when computing the separability of different classes with training samples, supervised HHT-based method can have better result than unsupervised HHT-based method and can reach 90% classification accuracy with six or seven features.Furthermore, it also has superior classification accuracies than linear and nonlinear WFE.By extracting features from Hilbert spectrum, we can not only reduce the dimensionality of hyperspectral image but also get a small number of salient features for classification.Therefore, Hilbert-Huang is an appropriate and effective tool for hyperspectral image analysis.
In the future, the effectiveness of HHT-based methods still could be improved.In addition, the objects in the experiments are mainly the minerals.It is another object to investigate that HHT-based feature extraction methods proposed in this study are suitable and have similar/better results than WFE methods for other kind of material objects such as metropolitan area of vegetation area.
(a) is the well-known Cuprite data set, which is a mineral region at Nevada.The image size of the test field is 350×350.The number of bands is 224. Figure 1(b) also shows a mineral map produced in 1995 by USGS.In this study, we choose 6 classes from this map ( Figure 1.An AVIRIS data set of Cuprite as well.

Figure 2 .
Figure 2. The flow chart of wavelet-based or HHT-based feature extraction

Figure 3
Figure 3 shows the classification accuracies with different HHT-based feature extraction methods.We can find that both unsupervised and supervised HHT-based methods have good classification accuracies.The classification results both conform to Hughes phenomenon that accuracy increases at first and then accuracy decline when the number of features increases with constant number of training samples.Compared with unsupervised HHT-based method, supervised HHT-based method can improve classification accuracy apparently.Supervised HHT-based feature extraction can achieve better classification accuracy (90%) with six and seven features, whereas unsupervised HHT-based method has lower accuracy (81.33%) with four features.Therefore, supervised HHT-based method can have better results by calculating the separability of different classes with training samples than unsupervised HHTbased method.

Figure 3 .
Figure 3.Comparison of classification accuracies between HHT-based methods

Figure 5 .
Figure 5.Comparison of classification accuracies between wavelet-based and HHT-based feature extraction methods

Table 2 )
for feature extraction and classification.Table 2 also shows the number of training samples and check sample for image classification.

Table 2 .
The 6 chosen classes