PERFORMANCE COMPARISON OF EVOLUTIONARY ALGORITHMS FOR IMAGE CLUSTERING

Evolutionary computation tools are able to process real valued numerical sets in order to extract suboptimal solution of designed problem. Data clustering algorithms have been intensively used for image segmentation in remote sensing applications. Despite of wide usage of evolutionary algorithms on data clustering, their clustering performances have been scarcely studied by using clustering validation indexes. In this paper, the recently proposed evolutionary algorithms (i.e., Artificial Bee Colony Algorithm (ABC), Gravitational Search Algorithm (GSA), Cuckoo Search Algorithm (CS), Adaptive Differential Evolution Algorithm (JADE), Differential Search Algorithm (DSA) and Backtracking Search Optimization Algorithm (BSA)) and some classical image clustering techniques (i.e., k-means, fcm, som networks) have been used to cluster images and their performances have been compared by using four clustering validation indexes. Experimental test results exposed that evolutionary algorithms give more reliable cluster-centers than classical clustering techniques, but their convergence time is quite long.


INTRODUCTION
Image clustering (Wang & Wang , 2009;Chen, et.all. , 2002;Halkidi et.all. , 2001) is a quite important unsupervised learning tool and it is one of the most intensively used image segmentation operators.Image clustering is used for segmentation of pixels into groups according to predefined objective functions.Objective functions are generally designed for minimizing cumulative distances between the pixels.One of the distance computing methods (e.g., euclidean distance, minkowski distance, manhattan distance, mahalanobis distance or derivatives of these) can be used, in order to compute the distances between pixels.
Clustering process involves seven basic steps: 1-Detection of optimum/suboptimum cluster numbers 2-Data selection 3-Data modeling 4-Setting objective function 5-Detection of clustering method 6-Computation process 7-Interpretation and validation.
Clustering methods can be classified into three groups as errorminimizing based methods, probability based methods and graphbased methods.There are lots of clustering methods introduced in the literature (Kasturi et.all. , 2003;Sharan et.all. , 2003;Shu et.all. , 2003) .The most intensively used methods are k-means, fuzzy-c means, isodata, decision-trees, mean-shift, hierarchical clustering, gaussian mixture-models, and unsupervised-artificial neural networks.
In the literature, there are some analytic methods proposed for the detection of optimum cluster number, k but it is still a difficult problem.In this paper, Gap-Statistics and Calinski-Harabasz indexes have been used in order to detect k.Gap Method is based on statistical comparison of within-cluster dispersion of clustering results of an arbitrary clustering technique and estimated dispersion of within-cluster pixels (Wang & Wang , 2009;Chen, et.all. , 2002;Halkidi et.all. , 2001;Kasturi et.all. , 2003;Sharan et.all. , 2003;Shu et.all. , 2003;Zhao and Karypis , 2005;Xu and Wunsch , 2005;Halkidi et.al. , 2005).

DATA CLUSTERING
In this section, classical data clustering methods and gap-statistics have been explained briefly.

k-Means Clustering
K-Means clustering method is the simplest and powerful unsupervised learning method used for data clustering.K-Means has been quite popular in pattern recognition and cluster analysis in data mining application.K-Means is based on the partition of the observed data, n, into k-clusters by minimizing the within-cluster sum of squares by using Eq.1: (1)

Fuzzy C-Means Algorithm (FCM)
FCM is an unsupervised learning method which allows one piece of data to belong to two or more clusters.FCM is frequently used in pattern recognition, image segmentation and computer vision applications.FCM is based on minimization of function given in Eq. 2.
The cluster centers are calculated by using Eq.s 3-4.
The partitioning process of FCM is finalized when the condition defined by Eq. 5 is realized.
where k shows the iteration number, ε ∈ [0 1] value is the threshold value used for finalizing the calculations.

Self Organizing Map Artificial Neural Network (SOM)
SOM is a competitive learning tool, whicht has been proposed for data clustering.SOM is an iterative unsupervised learning tool and it has been intensively used for pattern recognition and data clustering applications.SOM transforms the training data samples to topologically ordered maps.SOMs are analogically similar to the generalized principal component transform because of they required topographic map of the input patterns.At each training iteration, x-input is randomly selected and the distances between x and SOM vectors are recomputed.The winner unit is the vector closer to x as expressed by Eq. 6.
where t, α(t), and h bi (t) denote time, adaptation coefficient and winner-unit, respectively.

Gap-Statistics
There are some analytical methods proposed for detection of the optimal number of clusters such as Silhouette index, Davies-Bouldin index, Calinski-Harabasz index, Dunn index, C index, Krzanowski-Lai index, Hartigan index, weighted inter-intra index and Gap-Statistics.Gap-Statistics is quite sensitive to statistical properties of the data, therefore it is intensively used for analyzing of data clustering quality.The gap value is defined by using Eq. 8.
where n and k denote simple size and the cluster number to be tested.W k is defined by using Eq.9; The E * n {log (W k )} values have been computed by using Monte Carlo sampling based statistical method.The gap value is defined by using Eq.10: where K is the number of clusters, Gap(K) is the gap value for the clustering solution with K clusters, GAPMAX is the largest gap value, and SE(GAPMAX) is the standard error corresponding to the largest gap value.

EVOLUTIONARY ALGORITHMS
In this section, some of the popular evolutionary algorithms, namely, ABC, GSA, CS, JADE, DSA and BSA (Civicioglu , 2013a(Civicioglu , , 2012(Civicioglu , , 2013b;;Civicioglu and Besdok , 2013)  GSA (Gravitational Search Algorithm) is an evolutionary search algorithm, which has been inspired from the universal gravitational laws.Random solution of the problem is modeled as artificialbodies that apply newtonian gravitational force to each other.
Mass of an artificial-body and the quality of the solution that artificial-body provides for the problem are related with each other.
When the quality of the solution is higher, the speed that artificialbody abandons that position gets slower due to the gravitation force applied to it by other artificial-bodies.In the search-space, the speed of the artificial-bodies with inferior quality of solution is higher.This allows GSA to efficiently search the search space for finding a solution of a problem.DSA has only two control parameters that are used for controlling the degree to which the trial pattern will mutate in comparison to the target pattern.For evolving towards stopovers that provide a better fitness value, each trial pattern uses the corresponding target pattern.Table 1 gives the initial values of the relevant control parameters of the ABC, GSA, CS, JADE, DSA and BSA.Statistical comparison results of the mentioned methods have been given in Table 2 for the first test image.We have used some of the well-known clustering validation indexes (i.e., Davies-Bouldin, Silhouette, Adjusted-Rand, and R-Squared) in order to evaluate the quality of the clustering success of the algorithms.All the evolutionary algorithms used the same objective function, which aims to maximize the silhouette index value.The analytical validation of clustering results is also in data clustering analysis.

CS (Cuckoo
Clustering validity indices are intensively used in scientific community in order to evaluate clustering results.The most intensively used indexes are silhouette index, davies-bouldin, calinskiharabasz, dunn index, R-squared index, hubert-levin (C-index), krzanowski-lai index, hartigan index, root-mean-square standard deviation index, semi-partial R-squared (SPR) index, distance between two clusters (CD) index, weighted inter-intra index, homogeneity index, and Separation index.In this paper, we have used cvap toolbox in order to compute clustering indexes.All the simulations have been conducted by using Matlab running on a dual-core Xeon(R) CPU E5-2640 computer.
A low value of davies-bouldin index indicates that good cluster structures have been computed.A larger silhouette-index indicates that better quality of clustering results have been obtained.An adjusted-rand index with higher score indicates that better clustering results have been achieved.A large R-squared index value indicates that the difference between clusters is big.Because of the huge computer-memory requirements for computation of some indexes, mean-index values of 100 trials have been computed.At each trial, mean-clustering validation index values of 100 runs have been computed and results have been tabulated in Table 2.As it is seen from the   In the second test, we have used a 8 bits/pixel real world image of [256x256] pixels size.Cluster indexes computed for the second test image have been tabulated in Table 3.

RESULTS AND CONCLUSIONS
In this paper, we have compared image clustering performances of some of the popular classical methods and evolutionary methods.We have used clustering validation indexes in order to evaluate clustering performances of the related methods.Experimental results exposed that k-means algorithm is the best classical method for data clustering.Evolutionary algorithms, except GSA, give more successful clustering results when compared to  the classical methods.We have used multi-runs for evolutionary algorithms, in order to avoid from affects of initial conditions of evolutionary algorithms.The best solution obtained by the related evolutionary algorithms has been used in the tests.DSA and BSA supplied similar results in the tests.DSA, BSA and CS are detected as the most successful evolutionary algorithms in the tests.BSA is extremely robust and it converges to almost the same clustering results at each time.Despite of their clustering success, using evolutionary algorithms for image clustering is time consuming.Hence, we have obligated to use small-sized images, but hybridization of evolutionary algorithms with classical methods (especially with k-means) gives quite useful results aspect of some clustering validation indexes.
For obtaining direction matrix, standard DSA has 4 different options: Bijective DSA (B-DSA), Surjective DSA (S-DSA), Elitist1 DSA (E1-DSA) and Elitist2 DSA (E2-DSA).In B-DSA, population evolves for each cycle into the randomly permuted form of the current population.In S-DSA, population evolves into artificial organisms in order to find relatively better solutions.In E1-DSA, population evolves into the randomly selected top-best solutions of the original population.In E2-DSA, population evolves into the better solution of the original population.In this paper, S-DSA and E2-DSA have been employed.BSA (Backtracking Search Optimization Algorithm) has a simple structure but it is a fast and effective algorithm that easily adapts to solving multimodal optimization problems.BSA can be considered as an advanced and modernized PSO.The local and global search abilities of BSA are very powerful and it has new strategies for crossover and mutation operations.BSA has a short-range memory in which it stores a population from a randomly chosen previous generation in generating the searchdirection matrix.Thus, BSA has the advantage of using experiences gained from previous generations when it generates a new trial preparation.
have been explained briefly.
ABC (Artificial Bee Colony Algorithm) analogically simulates nectar source search behavior of honey-bees.It is a populationbased evolutionary search algorithm that has two-phased search strategy.Its problem solving success for multimodal problems is limited because ABC's search strategy is elitist.ABC has no crossover operator but has two control parameters.Its mutation strategy is rather similar to that of DE.
Search Algorithm) is a population based algorithm that has an elitist stochastic search strategy.CS tends to evolve each random solution towards to the best solution obtained beforehand.CS has two control parameters.The structure of CS is similar to those of DE and ABC.However, it has an excellent problem solving success in comparison to ABC, DE and some DE variants.
lates a superorganism migrating between two stopovers.DSA has only unique mutation and crossover operators.The structure of mutation operator of DSA contains just one direction pattern apart from the target pattern.Compared to the structures of crossover operators used in advanced DE algorithms, the structure of crossover operator of DSA is very different from them.

Table 2 ,
K-Means algorithm has obtained more successful clustering results within the classical algorithms.Contrary to classical algorithms, evolutionary algorithms, except GSA, have obtained more successful clustering results than classical methods.

Table 2 :
Clustering index values for the Test-1.

Table 3 :
Clustering index values for the Test-2.