UNCERTAIN TRAINING DATA EDITION FOR AUTOMATIC OBJECT-BASED CHANGE MAP EXTRACTION

Due to the rapid transformation of the societies, and the consequent growth of the cities, it is necessary to study these changes in order to achieve better control and management of urban areas and assist the decision-makers. Change detection involves the ability to quantify temporal effects using multi-temporal data sets. The available maps of the under study area is one of the most important sources for this reason. Although old data bases and maps are a great resource, it is more than likely that the training data extracted from them might contain errors, which affects the procedure of the classification; and as a result the process of the training sample editing is an essential matter. Due to the urban nature of the area studied and the problems caused in the pixel base methods, objectbased classification is applied. To reach this, the image is segmented into 4 scale levels using a multi-resolution segmentation procedure. After obtaining the segments in required levels, training samples are extracted automatically using the existing old map. Due to the old nature of the map, these samples are uncertain containing wrong data. To handle this issue, an editing process is proposed according to K-nearest neighbour and k-means algorithms. Next, the image is classified in a multi-resolution object-based manner and the effects of training sample refinement are evaluated. As a final step this classified image is compared with the existing map and the changed areas are detected. 1 Corresponding author.


INTRODUCTION
Traditionally, pattern recognition methods have been sorted into two broad groups: supervised and unsupervised, according to the level of previous knowledge about the training sample identifications in the problem at hand.Much of the research work in the frame of supervised pattern recognition has been almost entirely devoted to the analysis of the characteristics of classification algorithms and to the study of feature selection methods (Barandela and Gasca, 2000).Meanwhile the accuracy of training data is highly important for all supervised classification studies.Although most advanced non-parametric classifiers such as neural networks, support vector machines and decision trees can handle small inaccuracies in the training data, a classifier trained on a large component of clearly inaccurate or non-representative training data for extrapolation will also result in inaccurate results (Colditz et al., 2008).Recently, however, an increasing emphasis is being given to the evaluation of procedures used to collect and to clean the training sample, a critical aspect for effective automation of discrimination tasks.The Nearest Neighbour (NN) rule (Dasarathy, 1991) is a supervised nonparametric classifier, which has no prior assumption about probabilistic density functions.The performance of this classifier, as with any nonparametric method, is extremely sensitive to incorrectness or imperfections of the training sample.The present work introduces a methodology for decontaminating imperfect Training Samples (TSs) while employing the NN rule for classification.This methodology can be regarded as a cleaning process to remove some suspicious training samples, or correcting the labels of some others and retrieving them (Barandela and Gasca, 2000).Also, in this study k-means clustering method is used for cleaning the training data as will be described later.This study introduces an automated method for detecting changes of existing old maps from recent images.Due to this the importance of the training data accuracy is considered.After classification of the image and comparing with the old map, change map is extracted.

METHODOLOGY
In this research it is intended to produce a change map using object based classification methods while at the same time it is aimed to refine training data acquired automatically from old maps.The applied methodology, presented in Figure 1, involves preprocessing, segmentation, training data selection and edition, object-based classification and finally change map extraction.

Preprocessing
In this study two IRS-P5 and IRS-P6 images with a map of the under study area in 1:25000 scale are used as the input data.In order to prepare the input images, radiometric enhancement, registration and image fusion are implemented in the preprocessing stage.

Segmentation
Segmentation algorithms, regarded as the first procedure in object-based image processing techniques, are used to subdivide the entire image from pixel level domain to specific image objects.Segmentation algorithms are required whenever you want to create new image objects levels based on the image layer information.But they are also a very valuable tool to refine existing image objects by subdividing them into smaller pieces for a more detailed analysis (Definiens, 2007).Multi-resolution segmentation, applicable on the pixel level or an image object level domain, is implemented.this segmentation applies an optimization procedure which locally minimizes the average heterogeneity of image objects for a given resolution (Definiens, 2007).

Training Data Selection
In a supervised object-based classification, it is essential to collect training samples usually in the form of known image segments (image objects) applied related to train the classifier.
There are some methods to select training data; most of these methods depends on an experienced operator and his knowledge of the under study area.Although these methods are usually reliable and accurate, but so time consuming and costly.
In this study in order to extract training data automatically, and reduce the operator intervention, a method based on extracting data from old map is proposed.
To extract the training data, the old map and the segmented image are overlaid and the segments that are under the specified class in the map are selected as training data for that class.
Figure 2 shows that how the map and image segments are overlaid.
Figure 2. Training sample selection process.

Training Data Edition
Regarding the fact that the input image and the old map are not time-consistent, obviously there are some incorrect samples in the collected training data.
As mentioned before, the wrong training data affect directly classification accuracy; and the accuracy of the classification also has a direct effect on the extraction of image changes.In order to minimize the errors caused by imperfect training data, two refinement methods are used described in the following sections.(i=1,2,...,n), where the label  may take values in {1,2,...c} and i  designates the class of i x among the c possible classes.For classifying an unknown pattern X with the NN rule, it is necessary to determine first the nearest neighbor x´ of X in the TSs.That is, it is necessary to find x´ in TSs such that:

K Nearest
where d( , ) means any suitable metric defined in the feature space.Then, the pattern X is assigned to the class identified by the label associated to x´.Devijver and Kitler (Devijver and Kittler, 1982) have expressed that "the basic idea behind the NN rule is that samples which fall close together in feature space are likely to belong to the same class".A more graphical description is due to (Dasarathy, 1977): "it is like to judge a person by the company he keeps".Two other peculiarities of the NN rule have contributed to its popularity: a) easy implementation, and b) known error rate bounds.The computational burden of this classifier, very high with brute-force searching methods, has been considerably cut down by developing suitable data structures and associated algorithms (e.g., (Hardin and Thomson, 1992)) or by reducing the TSs size (e.g., (Huang et al., 1995)).For improving the NN rule´s performance, Wilson (Wilson, 1972) proposed a procedure (Edition Technique) to pre-process the TSs.The algorithm has the following steps: 1.For every i x in TSs, find the k (k=3 is recommended) nearest neighbours of i x among the other prototypes, and the class associated with the larger number of patterns among these k nearest neighbours.Ties would be randomly broken whenever they occur.x is edited (removed).In short, the procedure looks for modifications of the training sample structure through changes of the labels (re-identification) of some training patterns and removal (edition) of some others.

k-means:
The k-means algorithm has been the most popular unsupervised algorithm being widely used for automatic image segmentation in the field of remote sensing for many years ( (Ball and Hall, 1967), (Mather and Mather, 1976), (Mather and Koch, 2004)).This algorithm is implemented by recursively migrating a set of cluster means (centres) using a closest distance to mean approach until the locations of the cluster means are unchanged, or until the change from one iteration to the next is less than some predefined threshold.Change may also be defined in terms of the number of pixels moving from one cluster to another between iterations, or by the value of a measure of cluster compactness, such as the sum of squares of the deviations of each pixel from the centre of its cluster, summed for all classes.As described for edition with KNN algorithm, in this algorithm after clustering the training data, the outliers will be recognised and removed from the training list.

Classification
After image segmentation and collecting refined training data, the image should be classified.For this reason minimum distance classification is used in this study.The decision rule adopted by the minimum distance classifier to determine a pixel's label is the minimum distance between the pixel and the class centres, measured either by the Euclidean distance or the Mahalanobis generalized distance.Both of these distance measures can be described as dissimilarity coefficients, in that the similarity between objects i and j increases as distance becomes smaller.The mean spectral vector, or class centroid, for each class (plus the variance-covariance parameters if the Mahalanobis distance measure is used in place of the Euclidean distance) is determined from training data sets.A pixel of unknown identity is labelled by computing the distance between the value of the unknown pixel and each class centroid in turn.The label of the closest centroid is then assigned to the pixel.An example is shown in Figure 3, in which pixel is allocated to class 1 (i.e., given the label "1") according to the minimum distance between the pixel and class centre.The shape of each class depends on which distance function (Mahalanobis or Euclidean) is used.

Extracting change map
After classifying the image, by comparing it with the old map the changed areas will be extracted.To reach this the classified image and the map have to be overlaid and the areas that are removed or unchanged will be identified.

Pre-processing
Both input images are enhanced for better radiometric characteristic and then are geo-referenced to the existing old map.Geo-referencing is performed according to a 3 order polynomial.For this reason 46 Ground Control Points (GCPs) are used in IRS-P5 and geo-referencing accuracy is evaluated based on 7 Independent Check Points (ICPs).In the same way 52 CPSs and 6 ICPs are used for IRS -P6 geo-referencing.IRS-P5 and IRS-P6 images are fused to take advantage of the high spatial resolution of IRS-P5 and high spectral resolution of IRS-P6 and the next processes applied on this fused image.

Segmentation
In this step, the fused images segmented by multi resolution segmentation algorithm, in four level.In each level, scale parameter, shape and compactness are changed.
Table 2 shows the parameters that are used in these segmentations.

Classification
To extract the changed areas in the study area, the image has to be classified and compared with the old map.To classify the image in this study, minimum distance classification is used.To classification with this method, training data to train the classifier is needed.

Training Data Selection:
To extract training data, the existed map and segmented image are overlaid and the segments those are under the specified map's segment are extracted as training data for that specified class.The map is older than the image and because of that the segments which are selected as training data contain wrong ones.In other words, some training data do not belong to the class that they are labelled to and this will affect the classifier.Error!Reference source not found.shows some training data that are selected in this step for each class; and shows wrong data that are selected.

Training Data Edition:
As described before, the extracted training data have some outliers and must be refined.In this step the edition of the training data is applied by two robust algorithms, k nearest neighbour and k means.For editing with KNN algorithm, if k' of k nearest neighbour of each data are not in the same class as the class of specified data, the edition is applied on it.In this study k'=3 and k=20 and k=19 are selected.Furthermore for editing with k-means algorithm, the evaluating is done by 3 iterations.The results are shown in Table 3 and  Table 4.

Change Map Extraction
In this step, the classified image and the map are compared and the changes in the area are extracted.To do this the most accurate classification for comparing, overlaid with map and the areas that are changed or stay unchanged determined.
Figure 8 shows the change map.

CONCLUSIONS
In this paper, we presented a method for change detection with respect to the automatic training data selection.These training data due to the selection procedure have so many outliers; and so it has to be edited.The edition process improved considerably the classification results.The best result in classification was reached in segmentation level III.For editing process, we presented two robust methods, KNN and k-means classifier.In editing with KNN algorithm, at most we have improvement in classification more than 20% but this method is so time consuming.In k-means method, we have improvement of 27% and this method is so fast and reliable.
not agree with the class associated with the largest number of the k nearest neighbours as determined in the foregoing.For each prototype i x in TSs its k nearest neighbours are searched in the remainder of TS.If a particular class has at least k´ representatives among these k nearest neighbours then i x is labelled according to that class, independently of its original label.Otherwise, i

Figure 4
Figure4shows the results of these segmentations.

Figure 5 .
Figure 5. Part of under study area.
Figure 7 shows cleaned training samples in the image.
Selected training samples for each class before editing.(a), (b), (c), (d) show training data for segmentation level I, II, III and IV respectively.

Figure 8 .
Figure 8. Changed areas.Red segments shows the areas that were changed (removed).

Table 1
shows the results of the image geo coding.

Table 3 .
Overall accuracies for classification trained with training data that are edited with KNN algorithm

Table 4 .
Overall accuracies for classification trained with training data that are edited with k-means algorithm