Remote Sensing Image Classification Applied to the First National Geographical Information Census of China

: Image classification will still be a long way in the future, although it has gone almost half a century. In fact, researchers have gained many fruits in the image classification domain, but there is still a long distance between theory and practice. However, some new methods in the artificial intelligence domain will be absorbed into the image classification domain and draw on the strength of each to offset the weakness of the other, which will open up a new prospect. Usually, networks play the role of a high-level language, as is seen in Artificial Intelligence and statistics, because networks are used to build complex model from simple components. These years, Bayesian Networks, one of probabilistic networks, are a powerful data mining technique for handling uncertainty in complex domains. In this paper, we apply Tree Augmented Naive Bayesian Networks (TAN) to texture classification of High-resolution remote sensing images and put up a new method to construct the network topology structure in terms of training accuracy based on the training samples. Since 2013, China government has started the first national geographical information census project, which mainly interprets geographical information based on high-resolution remote sensing images. Therefore, this paper tries to apply Bayesian network to remote sensing image classification, in order to improve image interpretation in the first national geographical information census project. In the experiment, we choose some remote sensing images in Beijing. Experimental results demonstrate TAN outperform than Naive Bayesian Classifier (NBC) and Maximum Likelihood Classification Method (MLC) in the overall classification accuracy. In addition, the proposed method can reduce the workload of field workers and improve the work efficiency. Although it is time consuming, it will be an attractive and effective method for assisting office operation of image interpretation.


INTRODUCTION
On February 28 th , 2013, the State Department of China started a national project, the first national geographical information census, which is to achieve the latest geographical information of land cover and land use all over the nation based on highresolution remote sensing images from 2013 to 2015.The State Department of China authorized national administration of Surveying, Mapping and Geo-information to implement the project at the national level and at the same time, local government authorized local administration of Surveying, Mapping and Geo-information to implement it at the local level.Last year, firstly local administration of Surveying, Mapping and Geo-information submitted the project results to local government and then the local government submitted it to national administration of Surveying, Mapping and Geoinformation.After national administration of Surveying, Mapping and Geo-information checked and accepted, it was finally submitted to the State Department of China.In this project, there are ten first level classes, forty second level classes and seventy seven third level classes of geographical information of land cover.Because different office interpreters in the project are familiar with different classes and master the interpretation features of different classes, it is very difficult for office interpreters to master the interpretation features of all classes and more difficult to exactly determine the class attribute among those easily confused classes based on highresolution remote sensing images.However, in this case, office interpreters can differentiate the border of among different classes based on high-resolution remote sensing images, although they are not able to determine the class attribute.Thus, through office operator's description, we can get some irregular polygons that are not able to determine the corresponding class attribute.In this paper, we vies these irregular polygons on the high-resolution remote sensing images as objects, which are extremely similar to the concept of object achieved automatically through some kind of image segment algorithm in the image processing field.Nevertheless, the object achieved through the description of office operates is more accurate and meaningful than the object achieved automatically through the image segment algorithms.Therefore, this paper puts up a new method of objected image classification based on Bayesian networks and high-resolution remote sensing images to help office interpreters to determine what class the object belongs to.The experimental results show that the proposed method can improve the efficiency of image interpretation for office interpreters and to some extent the method can reduce the workload of field workers.This paper is organized as follows.In section 2, we review some basic concepts of Tree Augmented Naive Bayesian Networks (TAN) and then we introduce how to apply Bayesian networks to land cover image classification in section 3. Then in section 4 we describe the experiments and test on Highresolution remote sensing images based on TAN.Finally Section 5 draws some conclusions.

BAYESIAN NETWORKS
In this section, we simply introduce some basic concepts about Bayesian Networks and then apply it to land cover classification of High-resolution remote sensing images.

Basic concepts
The concept of Bayesian networks was put up by Pearl in 1986, but at that time it was not paid attention to until researchers found good performance when Naive Bayesian network applied to image classification (Friedman, 1997).In fact, Bayesian networks are a combination of Bayesian Probability and Graph Theory.On one hand, it qualitatively the dependent relationship among nodes or variables intuitively in the view of graph theory and on the other hand, it quantitatively describe the corresponding dependent relationship among nodes in the view of Bayesian probability.Furthermore, Burgary node is the parent node of Alarm node and conversely Alarm node is the children node of Burgary node.Again, Alarm node is the Earthquake node of Burgary node and is also the parent node of JohnCalls node and MarryCalls node.Accordingly, the conditional probability in the figure 1 represents the corresponding dependent relationship among those nodes.For example, the table in the middle of figure 1, the second row represents that the probability that Alarm node would happen is equal to 0.95 in the condition that the two nodes, Burgary node and Earthquake node, happen at the same time.In addition in the first condition probability table of figure 1 (left top) represents the prior probability of Burgary node, the probability that Burgary node happens without any other known information.

Inference Model
There are some kinds of Bayesian networks according to the structure and complexity of Bayesian network, among which the Naive Bayesian network is the simplest one.It assumes that all children node to the parent node are conditional independent although the conditional independent hypothesis is almost not met in the reality.Considering that the features extracting from high-resolution remote sensing images are always relevant, we select Tree Augmented Naive Bayesian Networks (TAN) to study object-oriented image classification and the inference formula is as follow below.The details of the inference model may reference to the reference ( YU Xin, 2007).
X =different features extracting from high-resolution remote sensing images.

Processing steps
In order to describe the proposed method applied to objectoriented image classification, the mainly processing steps are as follows below.
① randomly select some samples as train samples from sample database; ② extract the statistical texture feature and structural texture feature based on the train samples; ③ learn Bayesian network's parameters based on prior information and train sample information; ④ randomly select some samples as test samples from sample database and extract the corresponding texture features same as train samples; ⑤ calculate the posterior probability of each test sample based on the inference model; ⑥ determine the class attribute based on the maximum posterior probability principle; ⑦ evaluate the classification results based on confused matrix to achieve classification accuracy.

EXPERIMENTS AND ANALYSIS
In this section, we simply introduce experimental data and results and analysis.

Experimental data
The experimental data in this paper came from the remote sensing images in the first national geographical information census project.There are colourized remote sensing images, which are achieved in fall, 2013.The spatial resolution of remote sensing images in the plain area is 0.19 meters and the spatial resolution of remote sensing images in the mountain area is 0.48 meters.In fact, the three classes, coniferous forest, broad-leaved forest and mixed broadleaf-conifer forest, are most confused and most difficult for office interpreters to distinguish based on coloured high-resolution remote sensing images.Thus, in order to show the performance of the proposed method, we only consider the three classes in the experiments.An example of each class is displayed below.In the figure 3 (An sample of each class), the red scope represents the border of one certain class, which can be manually described by office interpreter.In table 1, N in the first row represents the number of train samples (from 5 to 30) and the first column represents different classification method (MLC, NBC and TAN).From the above table, we can draw some conclusions below.
(1) The proposed method is effective and efficient; (2) As the number of train sample increases, the classification accuracy of the three method all increase.But when the number of train sample is over 20, the classification of the three method become relatively stable.
(3) Among the three methods, the two methods (MLC and NBC) are almost the same in the view of classification accuracy, but their classification accuracy are not as good as the classification accuracy of the proposed method.

CONCLUSIONS
During the national geographical information census in China, considering that it is easy for office interpreters to describe the border among different classes, but it is difficult for office interpreters to determine the class attribute, we regard irregular polygon within the border described manually by office interpreters as one object or one minimum classification unit and put up a new method of object-oriented land cover image classification based on Bayesian networks and high-resolution remote sensing images to help office interpreters to recognize or determine the class attribute.Experiments and analysis show that the proposed method is effective and efficient, and it performs better than MLC and NBC.In addition, the proposed method can reduce the workload of field workers and improve the work efficiency.

Figure 1 .
Figure 1.An example of Bayesian network In Figure 1 (David Heckerman., 1997), Burgary node, Earthquake node, JohnCalls node, MarryCalls node and Alarm node represent five nodes or five random variables.The directed edge from Burgary node to Alarm node represents the dependent relationship between Burgary node and Alarm node.Furthermore, Burgary node is the parent node of Alarm node and conversely Alarm node is the children node of Burgary node.Again, Alarm node is the Earthquake node of Burgary node and is also the parent node of JohnCalls node and MarryCalls node.Accordingly, the conditional probability in the figure1represents the corresponding dependent relationship among those nodes.For example, the table in the middle of figure1, the second row represents that the probability that Alarm node would happen is equal to 0.95 in the condition that the two nodes, Burgary node and Earthquake node, happen at the same time.In addition in the first condition probability table of figure 1 (left top) represents the prior probability of Burgary node, the probability that Burgary node happens without any other known information.
Feature description and extracting play an important role in the image analysis and understanding.Usually, texture features are extracted and they are divided into statistical texture feature and structural texture feature.In the aspect of statistical texture feature, according to Gray-Level Co-occurrence Matrix, this paper extracts angular second moment, entropy and Inverse Difference Moment.The corresponding formulas are below.In the aspect of structural texture feature, this paper extracts some features based on Symlets Wavelet transform.The more details reference to the reference (YUXin, 2008).

Figure 2
Figure 2 is an example of Bayesian networks applied to land cover image classification.

Figure 2 .
Figure 2. Bayesian network applied to image classification In the figure 2, the node C is a attribute class node (or variable) and X i describes one certain kind of texture feature extracting from some classification unit or object, which may be statistical texture feature or structural texture feature.According to the definition of Bayesian network, mixed broadleaf-conifer forest Figure 3.An sample of each class 4.2 Results analysis During the land cover interpretation, different numbers of train samples and classification methods both have an influence on classification accuracy to some extent.Thus, in this experiments, we choose maximum likelihood classification method (MLC), which is most usual in the image classification field, and Naive Bayesian network Classification method (NBC), which is the simplest method of Bayesian networks, to compare with the proposed method (Tree Augmented Naive Bayesian Networks ,TAN) based on the classification accuracy.In addition, we test classification accuracy in the condition of different numbers of train samples, which changes from 5 to 30.