BALANCED VS IMBALANCED TRAINING DATA: CLASSIFYING RAPIDEYE DATA WITH SUPPORT VECTOR MACHINES

Ustuner, M.; Sanli, F. B.; Abdikan, S.

doi:10.5194/isprs-archives-XLI-B7-379-2016

Articles | Volume XLI-B7

https://doi.org/10.5194/isprs-archives-XLI-B7-379-2016

© Author(s) 2016. This work is distributed under
the Creative Commons Attribution 3.0 License.

https://doi.org/10.5194/isprs-archives-XLI-B7-379-2016

© Author(s) 2016. This work is distributed under
the Creative Commons Attribution 3.0 License.

Articles | Volume XLI-B7

21 Jun 2016

| 21 Jun 2016

BALANCED VS IMBALANCED TRAINING DATA: CLASSIFYING RAPIDEYE DATA WITH SUPPORT VECTOR MACHINES

M. Ustuner, F. B. Sanli, and S. Abdikan

Keywords: Land cover classification, Imbalanced training data, Support Vector Machines, RapidEye, Agriculture

Abstract. The accuracy of supervised image classification is highly dependent upon several factors such as the design of training set (sample selection, composition, purity and size), resolution of input imagery and landscape heterogeneity. The design of training set is still a challenging issue since the sensitivity of classifier algorithm at learning stage is different for the same dataset. In this paper, the classification of RapidEye imagery with balanced and imbalanced training data for mapping the crop types was addressed. Classification with imbalanced training data may result in low accuracy in some scenarios. Support Vector Machines (SVM), Maximum Likelihood (ML) and Artificial Neural Network (ANN) classifications were implemented here to classify the data. For evaluating the influence of the balanced and imbalanced training data on image classification algorithms, three different training datasets were created. Two different balanced datasets which have 70 and 100 pixels for each class of interest and one imbalanced dataset in which each class has different number of pixels were used in classification stage. Results demonstrate that ML and NN classifications are affected by imbalanced training data in resulting a reduction in accuracy (from 90.94% to 85.94% for ML and from 91.56% to 88.44% for NN) while SVM is not affected significantly (from 94.38% to 94.69%) and slightly improved. Our results highlighted that SVM is proven to be a very robust, consistent and effective classifier as it can perform very well under balanced and imbalanced training data situations. Furthermore, the training stage should be precisely and carefully designed for the need of adopted classifier.

BALANCED VS IMBALANCED TRAINING DATA: CLASSIFYING RAPIDEYE DATA WITH SUPPORT VECTOR MACHINES

Useful Links

Useful External Links

Our Contact