Geodesic distance based fuzzy c-medoid clustering - searching for central points in graphs and high dimensional data

Clustering high dimensional data and identifying central nodes in a graph are complex and computationally expensive tasks. We utilize k-nn graph of high dimensional data as efficient representation of the hidden structure of the clustering problem. Initial cluster centers are determined by graph centrality measures. Cluster centers are fine-tuned by minimizing fuzzy-weighted geodesic distances. The shortest-path based representation is parallel to the concept of transitive closure. Therefore, our algorithm is capable to cluster networks or even more complex and abstract objects based on their partially known pairwise similarities.The algorithm is proven to be effective to identify senior researchers in a co-author network, central cities in topographical data, and clusters of documents represented by high dimensional feature vectors.

[1]  O. Diekmann,et al.  Comment on "Linking population-level models with growing networks: a class of epidemic models". , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[3]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[4]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[5]  Pierre Comon,et al.  Unsupervised clustering on multi-component datasets: Applications on images and astrophysics data , 2008, 2008 16th European Signal Processing Conference.

[6]  János Abonyi,et al.  Hybrid Minimal Spanning Tree and Mixture of Gaussians Based Clustering Algorithm , 2006, FoIKS.

[7]  Leo Grady,et al.  Random Walks for Image Segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jiawei Han,et al.  Mining scale-free networks using geodesic clustering , 2004, KDD.

[9]  James C. Bezdek,et al.  Relational duals of the c-means clustering algorithms , 1989, Pattern Recognit..

[10]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Alfred O. Hero,et al.  Graph based k-means clustering , 2012, Signal Process..

[12]  D. O. Hebb,et al.  The organization of behavior , 1988 .

[13]  Xinjian Zhuo,et al.  Application of an improved K-means algorithm in gene expression data analysis , 2011, 2011 IEEE International Conference on Systems Biology (ISB).

[14]  Michael N. Vrahatis,et al.  The New k-Windows Algorithm for Improving the k-Means Clustering Algorithm , 2002, J. Complex..

[15]  János Abonyi,et al.  Graph-Based Clustering and Data Visualization Algorithms , 2013, SpringerBriefs in Computer Science.

[16]  James C. Bezdek,et al.  Nerf c-means: Non-Euclidean relational fuzzy clustering , 1994, Pattern Recognit..

[17]  Seungjin Choi,et al.  Soft Geodesic Kernel K-Means , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[18]  Ferenc Szeifert,et al.  Modified Gath-Geva fuzzy clustering for identification of Takagi-Sugeno fuzzy models , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[19]  Yousef Saad,et al.  Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection , 2009, J. Mach. Learn. Res..

[20]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[21]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[22]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[23]  Arian Maleki,et al.  Geodesic K-means clustering , 2008, 2008 19th International Conference on Pattern Recognition.

[24]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[26]  János Abonyi,et al.  Optimization of Multiple Traveling Salesmen Problem by a Novel Representation Based Genetic Algorithm , 2011, Intelligent Computational Optimization in Engineering.

[27]  R. Krishnapuram,et al.  A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering , 1999, FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315).

[28]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Ulrike von Luxburg,et al.  Multi-agent Random Walks for Local Clustering on Graphs , 2010, 2010 IEEE International Conference on Data Mining.

[30]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[31]  Thomas Martinetz,et al.  Topology representing networks , 1994, Neural Networks.

[32]  Gert Sabidussi,et al.  The centrality index of a graph , 1966 .

[33]  Donald Gustafson,et al.  Fuzzy clustering with a fuzzy covariance matrix , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[34]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[35]  James C. Bezdek,et al.  Validity-guided (re)clustering with applications to image segmentation , 1996, IEEE Trans. Fuzzy Syst..

[36]  Noga Alon,et al.  Many random walks are faster than one , 2007, SPAA '08.

[37]  George Economou,et al.  Geodesic distance and MST based image segmentation , 2004, 2004 12th European Signal Processing Conference.

[38]  János Abonyi,et al.  Modified Gath-Geva clustering for fuzzy segmentation of multivariate time-series , 2005, Fuzzy Sets Syst..

[39]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.