α-Clusterable Sets

In spite of the increasing interest into clustering research within the last decades, a unified clustering theory that is independent of a particular algorithm, or underlying the data structure and even the objective function has not be formulated so far. In the paper at hand, we take the first steps towards a theoretical foundation of clustering, by proposing a new notion of "clusterability" of data sets based on the density of the data within a specific region. Specifically, we give a formal definition of what we call "α-clusterable" set and we utilize this notion to prove that the principles proposed in Kleinberg's impossibility theorem for clustering [25], are consistent. We further propose an unsupervised clustering algorithm which is based on the notion of α-clusterable set. The proposed algorithm exploits the ability of the well known and widely used particle swarm optimization [31] to maximize the recently proposed window density function [38]. The obtained clustering quality is compared favorably to the corresponding clustering quality of various other well-known clustering algorithms.

[1]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[2]  G H Ball,et al.  A clustering technique for summarizing multivariate data. , 1967, Behavioral science.

[3]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[4]  Francesco Lisi,et al.  Clustering Financial Data for Mutual Fund Management , 2008 .

[5]  M. N. Vrahatis,et al.  EFFICIENT UNSUPERVISED CLUSTERING THROUGH INTELLIGENT OPTIMIZATION , 2009 .

[6]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[7]  Ajith Abraham,et al.  Swarm Intelligence in Data Mining (Studies in Computational Intelligence) , 2006 .

[8]  Shai Ben-David,et al.  Clusterability: A Theoretical Study , 2009, AISTATS.

[9]  Andrew Zisserman,et al.  Advances in Neural Information Processing Systems (NIPS) , 2007 .

[10]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[11]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[12]  Joachim M. Buhmann,et al.  A theory of proximity based clustering: structure detection by optimization , 2000, Pattern Recognit..

[13]  C. H. Chen,et al.  Handbook of Pattern Recognition and Computer Vision , 1993 .

[14]  Michael N. Vrahatis,et al.  The New k-Windows Algorithm for Improving the k-Means Clustering Algorithm , 2002, J. Complex..

[15]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[16]  Swagatam Das,et al.  Automatic Clustering Using an Improved Differential Evolution Algorithm , 2007 .

[17]  Mauro Birattari,et al.  Swarm Intelligence , 2012, Lecture Notes in Computer Science.

[18]  Richard C. Dubes,et al.  Cluster Analysis and Related Issues , 1993, Handbook of Pattern Recognition and Computer Vision.

[19]  Jian Jhen Chen,et al.  K-means clustering versus validation measures: a data-distribution perspective. , 2009, IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society.

[20]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[21]  A. Engelbrecht,et al.  Self-Adaptive Differential Evolution Methods for Unsupervised Image Classification , 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems.

[22]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[23]  Dimitris K. Tasoulis,et al.  Financial forecasting through unsupervised clustering and neural networks , 2006, Oper. Res..

[24]  Dimitris K. Tasoulis,et al.  Unsupervised clustering in mRNA expression profiles , 2006, Comput. Biol. Medicine.

[25]  Shai Ben-David,et al.  Measures of Clustering Quality: A Working Set of Axioms for Clustering , 2008, NIPS.

[26]  Ching-Yi Chen,et al.  Particle swarm optimization algorithm and its application to clustering analysis , 2004, 2012 Proceedings of 17th Conference on Electrical Power Distribution.

[27]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[28]  C. Perna,et al.  Mathematical and statistical methods in insurance and finance , 2008 .

[29]  Dimitris K. Tasoulis,et al.  The new window density function for efficient evolutionary unsupervised clustering , 2005, 2005 IEEE Congress on Evolutionary Computation.

[30]  Marc Teboulle,et al.  Grouping Multidimensional Data - Recent Advances in Clustering , 2006 .

[31]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[32]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[33]  Phipps Arabie,et al.  AN OVERVIEW OF COMBINATORIAL DATA ANALYSIS , 1996 .

[34]  G. De Soete,et al.  Clustering and Classification , 2019, Data-Driven Science and Engineering.

[35]  Mohammed J. Zaki,et al.  Clusterability Detection and Initial Seed Selection in Large Data Sets , 1999 .

[36]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[37]  Leandro N. de Castro,et al.  Data Clustering with Particle Swarms , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[38]  Andries Petrus Engelbrecht,et al.  Data clustering using particle swarm optimization , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[39]  Sandra Paterlini,et al.  Differential evolution and particle swarm optimisation in partitional clustering , 2006, Comput. Stat. Data Anal..

[40]  Narendra Ahuja,et al.  Advances in Image Understanding: A Festschrift for Azriel Rosenfeld , 1996 .

[41]  Michael N. Vrahatis,et al.  Particle Swarm Optimization and Intelligence: Advances and Applications , 2010 .

[42]  Andries P. Engelbrecht,et al.  Computational Intelligence: An Introduction , 2002 .

[43]  Ajith Abraham,et al.  Swarm Intelligence in Data Mining , 2009, Swarm Intelligence in Data Mining.

[44]  George Karypis,et al.  Criterion Functions for Clustering on High-Dimensional Data , 2006, Grouping Multidimensional Data.

[45]  Dimitris K. Tasoulis,et al.  Improving the orthogonal range search k-windows algorithm , 2002, 14th IEEE International Conference on Tools with Artificial Intelligence, 2002. (ICTAI 2002). Proceedings..