Homophilic Clustering by Locally Asymmetric Geometry

Clustering is indispensable for data analysis in many scientific disciplines. Detecting clusters from heavy noise remains challenging, particularly for high-dimensional sparse data. Based on graph-theoretic framework, the present paper proposes a novel algorithm to address this issue. The locally asymmetric geometries of neighborhoods between data points result in a directed similarity graph to model the structural connectivity of data points. Performing similarity propagation on this directed graph simply by its adjacency matrix powers leads to an interesting discovery, in the sense that if the in-degrees are ordered by the corresponding sorted out-degrees, they will be self-organized to be homophilic layers according to the different distributions of cluster densities, which is dubbed the Homophilic In-degree figure (the HI figure). With the HI figure, we can easily single out all cores of clusters, identify the boundary between cluster and noise, and visualize the intrinsic structures of clusters. Based on the in-degree homophily, we also develop a simple efficient algorithm of linear space complexity to cluster noisy data. Extensive experiments on toy and real-world scientific data validate the effectiveness of our algorithms.

[1]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[2]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[3]  R. Tsien,et al.  Specificity and Stability in Topology of Protein Networks , 2022 .

[4]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[5]  Bernhard Schölkopf,et al.  Learning from labeled and unlabeled data on a directed graph , 2005, ICML.

[6]  Rongfang Bie,et al.  Clustering by fast search and find of density peaks via heat diffusion , 2016, Neurocomputing.

[7]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[8]  S. vanDongen Graph Clustering by Flow Simulation , 2000 .

[9]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[11]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[12]  Deli Zhao,et al.  Graph Degree Linkage: Agglomerative Clustering on a Directed Graph , 2012, ECCV.

[13]  D. Watts,et al.  Origins of Homophily in an Evolving Social Network1 , 2009, American Journal of Sociology.

[14]  William W. Cohen,et al.  Power Iteration Clustering , 2010, ICML.

[15]  Marián Boguñá,et al.  Popularity versus similarity in growing networks , 2011, Nature.

[16]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[17]  Nagarajan Natarajan,et al.  Exploiting longer cycles for link prediction in signed networks , 2011, CIKM '11.

[18]  Nello Cristianini,et al.  Learning Semantic Similarity , 2002, NIPS.

[19]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[20]  Marina Meila,et al.  Clustering by weighted cuts in directed graphs , 2007, SDM.

[21]  K. Sneppen,et al.  Specificity and Stability in Topology of Protein Networks , 2002, Science.

[22]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[23]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[24]  A. Raftery,et al.  Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes , 1998 .

[25]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[27]  Deli Zhao,et al.  Cyclizing Clusters via Zeta Function of a Graph , 2008, NIPS.

[28]  Minsu Cho,et al.  Authority-shift clustering: Hierarchical clustering by authority seeking on graphs , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .