Categorical data fuzzy clustering: An analysis of local search heuristics

The fuzzy c partition of a set of qualitative data is the problem of selecting the optimal c centroids that are the most representative of the whole population. Moreover, a set of weights wij must be determined, describing the fuzzy membership function of pattern i to the cluster represented by centroid j. Both problems are formulated by a single mathematical programming problem, that is an extension of the classic p-median models often used for clustering. The new objective function is neither concave nor convex and the application requires the clustering of many thousands of data, therefore heuristic methods are to be developed to find the best fuzzy partition. In this contribution, four methods are selected, that are implementations of meta-heuristics tested to solve p-median problems. Here, they are implemented and tested to find the optimal fuzzy c-partition. All heuristics implement neighborhood search with different strategies of visiting neighboring solutions: they are random restart method (RR), that is used in many commercial softwares and suggested in textbooks, tabu search (TS) that tries to find the best move to escape from a local optimum, variable neighborhood search (VNS), that explores quickly the solution space, candidate list search (CLS), that explores only interesting starting solutions. It is found that there is not a clear best method, but their performance depends on some parameter. TS is usually accurate, but time consuming. When c is small, VNS can be a reliable alternative, while, when c is large and there are many data to cluster, CLS provides good results. We point out that the simple RR method, that is sometimes used in commercial codes is of very poor quality: the implementation of one of the neighbor search algorithms leads to substantial improvements.

[1]  Enrique H. Ruspini,et al.  A New Approach to Clustering , 1969, Inf. Control..

[2]  Krzysztof Jajuga,et al.  Fuzzy clustering with squared Minkowski distances , 2001, Fuzzy Sets Syst..

[3]  Éric D. Taillard,et al.  Heuristic Methods for Large Centroid Clustering Problems , 2003, J. Heuristics.

[4]  Noureddine Zahid,et al.  Fuzzy clustering based on K-nearest-neighbours rule , 2001, Fuzzy Sets Syst..

[5]  Khaled S. Al-Sultan,et al.  A Tabu search approach to the clustering problem , 1995, Pattern Recognit..

[6]  P. Groenen,et al.  Cluster differences scaling with a within-clusters loss component and a fuzzy successive approximation strategy to avoid local minima , 1997 .

[7]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[8]  Nicos Christofides,et al.  Capacitated clustering problems by hybrid simulated annealing and tabu search , 1994 .

[9]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[10]  Pierre Hansen,et al.  Variable Neighborhood Search , 2018, Handbook of Heuristics.

[11]  Hans-Jürgen Zimmermann,et al.  Fuzzy Set Theory - and Its Applications , 1985 .

[12]  Pierre Hansen,et al.  Variable Neighborhood Decomposition Search , 1998, J. Heuristics.

[13]  Pierre Hansen,et al.  Fuzzy J-Means: a new heuristic for fuzzy clustering , 2001, Pattern Recognit..

[14]  Michael K. Ng,et al.  Clustering categorical data sets using tabu search techniques , 2002, Pattern Recognit..

[15]  P. Hansen,et al.  Variable neighborhood search for the p-median , 1997 .

[16]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[17]  Pierre Hansen,et al.  Cluster analysis and mathematical programming , 1997, Math. Program..