Proximity Curves for Potential-Based Clustering

The concept of proximity curve and a new algorithm are proposed for obtaining clusters in a finite set of data points in the finite dimensional Euclidean space. Each point is endowed with a potential constructed by means of a multi-dimensional Cauchy density, contributing to an overall anisotropic potential function. Guided by the steepest descent algorithm, the data points are successively visited and removed one by one, and at each stage the overall potential is updated and the magnitude of its local gradient is calculated. The result is a finite sequence of tuples, the proximity curve, whose pattern is analysed to give rise to a deterministic clustering. The finite set of all such proximity curves in conjunction with a simulation study of their distribution results in a probabilistic clustering represented by a distribution on the set of dendrograms. A two-dimensional synthetic data set is used to illustrate the proposed potential-based clustering idea. It is shown that the results achieved are plausible since both the ‘geographic distribution’ of data points as well as the ‘topographic features’ imposed by the potential function are well reflected in the suggested clustering. Experiments using the Iris data set are conducted for validation purposes on classification and clustering benchmark data. The results are consistent with the proposed theoretical framework and data properties, and open new approaches and applications to consider data processing from different perspectives and interpret data attributes contribution to patterns.

[1]  A. S. Ramsey The Theory of Newtonian Attraction. (Scientific Books: An Introduction to the Theory of Newtonian Attraction) , 1941 .

[2]  Arthur F. Kip,et al.  Fundamentals of Electricity and Magnetism , 1962 .

[3]  John L. Casti The waves of life: The Elliott wave principle and the patterns of everyday events , 2002, Complex..

[4]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[5]  Singiresu S. Rao,et al.  Optimization Theory and Applications , 1980, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  Leonard Susskind,et al.  The theoretical minimum : what you need to know to start doing physics , 2013 .

[7]  Li Junlin,et al.  Molecular dynamics-like data clustering approach , 2011 .

[8]  Joseph G. Ecker,et al.  Introduction to Operations Research , 1988, The Mathematical Gazette.

[9]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[10]  Armen Aghajanyan,et al.  Gravitational Clustering , 2015, ArXiv.

[11]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[12]  Yi Wan,et al.  Clustering by Sorting Potential Values (CSPV): A novel potential-based clustering method , 2012, Pattern Recognit..

[13]  A. D. Young Mathematics for Operations Research , 1978 .

[14]  William E. Wright,et al.  A formalization of cluster analysis , 1973, Pattern Recognit..

[15]  Christian Hennig,et al.  What are the true clusters? , 2015, Pattern Recognit. Lett..

[16]  Albert-Lszl Barabsi,et al.  Network Science , 2016, Encyclopedia of Big Data.

[17]  On a Generalization of Bivariate Cauchy Distribution , 2008 .

[18]  Yingjie Tian,et al.  A Comprehensive Survey of Clustering Algorithms , 2015, Annals of Data Science.

[19]  Zheng Weimin,et al.  Potential-based hierarchical clustering , 2002, Object recognition supported by user interaction for service robots.