ASSESSMENT OF NORMALIZATION TECHNIQUES ON THE ACCURACY OF HYPERSPECTRAL DATA CLUSTERING
Keywords: K-means clustering, normalization techniques, density based initialization, hyperspectral data
Abstract. Partitioning clustering algorithms, such as k-means, is the most widely used clustering algorithms in the remote sensing community. They are the process of identifying clusters within multidimensional data based on some similarity measures (SM). SMs assign more weights to features with large ranges than those with small ranges. In this way, small-range features are suppressed by large-range features so that they cannot have any effect during clustering procedure. This problem deteriorates for the high-dimensional data such as hyperspectral remotely sensed images. To address this problem, the feature normalization (FN) can be used. However, since different FN methods have different performances, in this study, the effects of ten FN methods on hyperspectral data clustering were studied. The proposed method was implemented on both real and synthetic hyperspectral datasets. The evaluations demonstrated that FN could lead to better results than the case that FN is not performed. More importantly, obtained results showed that the rank-based FN with 15.7% and 12.8% improvement, respectively, in the synthetic and real datasets can be considered as the best FN method for hyperspectral data clustering.