CLASSIFICATION ACCURACY INCREASE USING MULTISENSOR DATA F USION

The practical use of very high resolution visible and near-i nfrared (VNIR) data is still growing (IKONOS, Quickbird, Ge oEye-1, etc.) but for classification purposes the number of bands is limite d in comparison to full spectral imaging. These limitations may lead to the confusion of materials such as different roofs, pavements, roads, etc. and therefore may provide wrong interpretation and use of classification products. Employment of hyperspectral data is an other solution, but their low spatial resolution (comparin g to multispectral data) restrict their usage for many applications. Another i mprovement can be achieved by fusion approaches of multisen sory data since this may increase the quality of scene classification. Integ ration of Synthetic Aperture Radar (SAR) and optical data is widely performed for automatic classification, interpretation, and c hange detection. In this paper we present an approach for ver y high resolution SAR and multispectral data fusion for automatic classificat ion in urban areas. Single polarization TerraSAR-X (SpotLi ght mode) and multispectral data are integrated using the INFOFUSE frame work, consisting of feature extraction (information fissio n), unsupervised clustering (data representation on a finite domain and dimen sionality reduction), and data aggregation (Bayesian or ne u al network). This framework allows a relevant way of multisource data com bination following consensus theory. The classification is ot influenced by the limitations of dimensionality, and the calculation c omplexity primarily depends on the step of dimensionality r eduction. Fusion of single polarization TerraSAR-X, WorldView-2 (VNIR or fu ll set), and Digital Surface Model (DSM) data allow for diffe rent types of urban objects to be classified into predefined classes of in terest with increased accuracy. The comparison to classific ation results of WorldView-2 multispectral data (8 spectral bands) is pro vided and the numerical evaluation of the method in comparis on to other established methods illustrates the advantage in the class ific tion accuracy for many classes such as buildings, low ve getation, sport objects, forest, roads, rail roads, etc.


INTRODUCTION
AVAILABILITY of high and very high spatial resolution multisensory data opens new perspectives for processing, recognition and decision making in urban areas containing a variety of objects and structures.Nevertheless, high resolution data is represented by optical sensors with limited spectral resolution.For example, the well known satellites providing high resolution data (IKONOS, Quickbird, GeoEye-1) acquire multispectral data only in VNIR range, except the new WorldView-2 satellite.Limited spectral range covered by the multispectral sensors does not allow to obtain high accuracy of thematic classification as well as relatively high number of classes.Employment of hyperspectral data is not a solution because of the low spatial resolution of most spaceborne sensors.Data fusion is employed to overcome this limitation on spatial resolution.Different modalities and different types of digital data (e.g.multispectral, SAR, Digital Elevetion Model (DEM), Geographic information system (GIS), vector maps, etc.) allow significant increase of the accuracy of automatic recognition and interpretation for urban areas only in the case when a correct fusion methodology is used.
A fusion methodology should properly deal with different statistics of input incommensurable multisensory data (e.g.optical and SAR).Several fusion methodologies following consensus theory (Benediktsson et al., 1997) were developed and successfully used (Pacifici et al., 2008, Fauvel et al., 2006, Rottensteiner et al., 2004) but still the number of thematic classes is low.Pacifici et. al. (2008) developed the best fusion algorithm for 2007 GRSS Data Fusion Contest.The algorithm is based on a neural network classification enhanced by preprocessing and postprocessing.Employment of 2 SAR images, 6 Landsat-5 spectral images, and 6 Landsat-7 spectral images resulted in the classification into 5 classes (City center, Residential area, Sparse buildings, Water, Vegetation) with Kappa coefficient equal to 0.93.Fauvel et. al. (2006) applied decision fusion (fuzzy decision rule) for classification of urban area.The overall accuracy of classification for 6 classes (Large buildings, Houses, Large roads, Streets, Open areas, and Shadows) is 75.7 %.

PROPOSED FUSION MODEL
Instead of continuous representation of data, a discrete representation of the data on a finite domain is employed.Discrete representation is motivated by the fact that integration of incommensurable multisensory data with different nature and statistics could be difficult using conventional statistical methods.To overcome this difficulty, a kind of "discretization" of continuous data is employed resulting in data with several possible states (e.g.multinomial distribution, see (Aksoy et al., 2005)).Neural network, Bayesian network, or discrete graphical models are employed to integrate the multisensory data with discrete states.
The fusion framework consists of three main steps: 1. Information fission: feature extraction from input data.
The aim of this step is to extract as much as possible information from input data (Palubinskas and Datcu, 2008).

Employed data
The optical and SAR data were orthorectified (SRTM 30m DEM) and distortions introduced by terrain are decreased.Orthorectified WorldView-2 (WV-2) and SpotLight Level-1B Product Terra-SAR-X (TSX) data were used.Detailed description of employed data is given in Table 1.WV-2 multispectral data were pansharpened by the General Fusion Framework method (Palubinskas and Reinartz, 2011).Registration of optical and radar data was made in ENVI using manual selection of control points.In more complicated cases other registration methods should be employed, e.g.(Suri and Reinartz, 2010).Detailed Digital Surface Model (DSM) of urban scene is generated using the Semiglobal Matching algorithm if Worldview-2 stereo pairs or triplets with small convergence angles (less then 20 degrees) are available.

Feature extraction
Specific feature types should be extracted to make exhausting description of data.For example, a multispectral image can be used for extraction of spectral information, Difference Vegetation Index (DVI) indexes, while TSX data is more suitable for extraction of texture features (Co-occurence, Gabor, Laws, etc.).For some data sources (e.g.DEM) feature extraction is not carried out and the data directly represented on the domain.The cardinality of the domain should be appropriately defined for different features (multispectral, textural, DEM, etc.).
The number of clusters for feature representation on finite set was equal to 50 (used for representation of all features).
VNIR data were taken from WV-2 multispectral image (bands 2,3,5,6).This range was taken since most of the very high resolution spaceborne sensors (e.g.IKONOS, Quickbird, GeoEye-1, etc.) acquire multispectral data in VNIR range.(Heldens et al., 2009).Feature representation on a finite domain allows to convert incommensurable features and data with different statistical properties and distributions into one type of distribution (e.g.multinominal distribution (Aksoy et al., 2005)).Fusion of multisensory data using INFOFUSE based on a neural network (OVA=90.1092,Kappa=0.8907)allowed to obtain higher accuracy comparing to fusion and classification results obtained by the neural network with the same structure (OVA=87.0697,Kappa=0.8566).These high accuracies of classification can be explained that the validated ground truth is available only for limited small areas or objects (e.g.several buildings).Therefore in practice (having ground truth for larger area) the accuracy is expected to be less.

RESULTS AND DISCUSSION
Low accuracies of the ML classification method may be caused that the ML classifier can not efficiently deal with different distributions of the data and features, or the multisensor data is not classified in the way of consensus classification (Benediktsson et al., 1997).Low accuracy for classification of single source data by the INFOFUSE method (WV-2, 8 features) as well as fusion of WV-2+DSM data (9 features) can be caused since the size of the finite domain (i.e. the number of clusters) is low.Therefore a loss of information during clustering influences the accuracy comparing to the methods dealing with original 11-bit single source data.
Figure 1 illustrates classification accuracy for the defined classes.
Classification results (Table 2 and Figure  Table 3 illustrates influence of a particular feature or sensor for proper separation of classes with similar spectral or textural properties according to the fusion and classification strategy.

CONCLUSIONS
In this paper we present results on high resolution multisensory data fusion for classification.The developed method follows consensus theory rules for multisensory data fusion and allows to fuse and classify input data (Multispectral, SAR, and DSM) into extended number of classes.
The data classification is not influenced by the limitations of dimensionality and the calculation complexity primarily depends A special acquisition model for SAR and optical data (Palubinskas et al., 2010) will be employed in future work in order to extract the most of the available information from the observed area.The model is also going to be employed for class-specific change detection on single and multisensory data.More thorough validation of the method is going to be performed on the new available ground truth data for the test area.

Figure 1 :
Figure1illustrates classification accuracy for the defined classes.Classification results (Table2 and Figure 1) illustrate the diffi-

Table 2
features, 9 neurons for 8 features), 50 clusters for each feature, k-means clustering was employed.For single sensor data (VNIR, WV-2, WV-2+DSM) fusion and classification using INFOFUSE 100 clusters were used for each feature.The ML was run in the ENVI software.Fusion and classification results for different combinations of the data and features as well as classification using single sensor data are given.The best accuracy of the classification provided by IN-FOFUSE and NN methods on the combination of the multispectral data, Gabor texture features are acquired both on the TSX and optical band and the DSM data.
presents results for fusion and classification using multisensory data as well as for single sensors.Comparison of two other methods: Maximum Likelihood (ML) (not following consensus theory) and Neural Network (NN) is also given for comparison.Neural Network employs 1 hidden layer, 40 neurons for 97, 104, or 105 features, 8 neurons for 9 features.INFOFUSE is based on Neural Network (1 hidden layer, 40 neurons for 97, 104, or 105

Table 3 :
Influence of data sources for classification of particular classes