Constrained Local Graph Clustering by Colored Random Walk

Detecting local graph clusters is an important problem in big graph analysis. Given seed nodes in a graph, local clustering aims at finding subgraphs around the seed nodes, which consist of nodes highly relevant to the seed nodes. However, existing local clustering methods either allow only a single seed node, or assume all seed nodes are from the same cluster, which is not true in many real applications. Moreover, the assumption that all seed nodes are in a single cluster fails to use the crucial information of relations between seed nodes. In this paper, we propose a method to take advantage of such relationship. With prior knowledge of the community membership of the seed nodes, the method labels seed nodes in the same (different) community by the same (different) color. To further use this information, we introduce a color-based random walk mechanism, where colors are propagated from the seed nodes to every node in the graph. By the interaction of identical and distinct colors, we can enclose the supervision of seed nodes into the random walk process. We also propose a heuristic strategy to speed up the algorithm by more than 2 orders of magnitude. Experimental evaluations reveal that our clustering method outperforms state-of-the-art approaches by a large margin.

[1]  Jing Li,et al.  Robust Local Community Detection: On Free Rider Effect and Its Elimination , 2015, Proc. VLDB Endow..

[2]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[3]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[4]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.

[5]  Aristides Gionis,et al.  The community-search problem and how to plan a successful cocktail party , 2010, KDD.

[6]  Jon M. Kleinberg,et al.  Community membership identification from small seed sets , 2014, KDD.

[7]  Wei Cheng,et al.  Many Heads are Better than One: Local Community Detection by the Multi-walker Chain , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[8]  Xiang Zhang,et al.  Automated Medical Diagnosis by Ranking Clusters Across the Symptom-Disease Network , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[9]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[10]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[11]  Rui Liu,et al.  Robust Multi-Network Clustering via Joint Cross-Domain Cluster Alignment , 2015, 2015 IEEE International Conference on Data Mining.

[12]  David F. Gleich,et al.  Heat kernel based community detection , 2014, KDD.

[13]  Cedric E. Ginestet,et al.  Cognitive relevance of the community structure of the human brain functional coactivation network , 2013, Proceedings of the National Academy of Sciences.

[14]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Hanghang Tong,et al.  Disease gene prioritization by integrating tissue-specific molecular networks using a robust multi-network model , 2016, BMC Bioinformatics.

[16]  Xiang Zhang,et al.  Cross-Network Clustering and Cluster Ranking for Medical Diagnosis , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[17]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[18]  Haixun Wang,et al.  Local search of communities in large graphs , 2014, SIGMOD Conference.

[19]  Dragomir R. Radev,et al.  DivRank: the interplay of prestige and diversity in information networks , 2010, KDD.

[20]  M. Benaïm Vertex-reinforced random walks and a conjecture of Pemantle , 1997 .

[21]  Christos Faloutsos,et al.  Fast best-effort pattern matching in large attributed graphs , 2007, KDD '07.

[22]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[23]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[24]  A survey of random processes with reinforcement , 2007, math/0610076.

[25]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).