Combining Fuzzy Clustering with ANN Classifier for Categorical Data

This article deals with the development of an improved clustering technique for categorical data that is based on the identification of points having significant membership to multiple classes. Cluster assignments of such points are difficult, and they often affect the actual partitioning of the data. As a consequence, it may be more effective if the points that are associated with maximum confusion regarding their cluster assignments are first identified and excluded from consideration at the first stage of algorithm and these points may be assigned to one of the identified clusters based on an ANN classifier at the second stage of this algorithm. At the first stage of this algorithm we are using our developed genetic algorithm and simulated annealing based fuzzy clustering and well known Fuzzy C-Medoids algorithm when the number of clusters is known a priori. The performance of the proposed clustering algorithms has been compared with the average linkage hierarchical clustering algorithm, in addition to the genetic algorithm based fuzzy clustering, simulated annealing based fuzzy clustering and Fuzzy C-Medoids with ANN for a variety of artificial and real life categorical data sets. Also statistical significance test have been performed to establish the superiority of the proposed algorithm.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[3]  L.K. Hansen,et al.  Adaptive regularization of neural classifiers , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[4]  Lars Kai Hansen,et al.  Outlier estimation and detection application to skin lesion classification , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  El-Ghazali Talbi,et al.  Clustering Nominal and Numerical Data: A New Distance Concept for a Hybrid Genetic Algorithm , 2004, EvoCOP.

[6]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[7]  P. Gács,et al.  Algorithms , 1992 .

[8]  Anirban Mukhopadhyay,et al.  Improved Crisp and Fuzzy Clustering Techniques for Categorical Data , 2008 .

[9]  R. Krishnapuram,et al.  A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering , 1999, FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315).

[10]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.