A novel grammar-based genetic programming approach to clustering

Most of the classical methods for clustering analysis require the user setting of number of clusters. To surmount this problem, in this paper a grammar-based Genetic Programming approach to automatic data clustering is presented. An innovative clustering process is conceived strictly linked to a novel cluster representation which provides intelligible information on patterns. The efficacy of the implemented partitioning system is estimated on a medical domain by exploiting expressly defined evaluation indices. Furthermore, a comparison with other clustering tools is performed.

[1]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[2]  Erik K. Antonsson,et al.  Dynamic partitional clustering using evolution strategies , 2000, 2000 26th Annual Conference of the IEEE Industrial Electronics Society. IECON 2000. 2000 IEEE International Conference on Industrial Electronics, Control and Instrumentation. 21st Century Technologies.

[3]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[4]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[5]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[6]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[7]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[8]  Ali M. S. Zalzala,et al.  A genetic rule-based data clustering toolkit , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[9]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[10]  Kwong-Sak Leung,et al.  Data Mining Using Grammar Based Genetic Programming and Applications , 2000 .

[11]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[12]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[13]  Simon Parsons,et al.  Principles of Data Mining by David J. Hand, Heikki Mannila and Padhraic Smyth, MIT Press, 546 pp., £34.50, ISBN 0-262-08290-X , 2004, The Knowledge Engineering Review.

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD '00.

[16]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.