Mining Connection Pathways for Marked Nodes in Large Graphs

Suppose we are given a large graph in which, by some external process, a handful of nodes are marked. What can we say about these nodes? Are they close together in the graph? or, if segregated, how many groups do they form? We approach this problem by trying to find sets of simple connection pathways between sets of marked nodes. We formalize the problem in terms of the Minimum Description Length principle: a pathway is simple when we need only few bits to tell which edges to follow, such that we visit all nodes in a group. Then, the best partitioning is the one that requires the least number of bits to describe the paths that visit all the marked nodes. We prove that solving this problem is NP-hard, and introduce DOT2DOT, an efficient algorithm for partitioning marked nodes by finding simple pathways between nodes. Experimentation shows that DOT2DOT correctly groups nodes for which good connection paths can be constructed, while separating distant nodes.

[1]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[2]  Alex Zelikovsky,et al.  An improved approximation scheme for the Group Steiner Problem , 2001, Networks.

[3]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[4]  Jure Leskovec,et al.  Web projections: learning from contextual subgraphs of the web , 2007, WWW '07.

[5]  Shang-Hua Teng,et al.  A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning , 2008, SIAM J. Comput..

[6]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[7]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[8]  Christos Faloutsos,et al.  Fast direction-aware proximity for graph mining , 2007, KDD '07.

[9]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[10]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[12]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[13]  Christos Faloutsos,et al.  GMine: a system for scalable, interactive graph visualization and mining , 2006, VLDB.

[14]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[15]  Alex Zelikovsky,et al.  A series of approximation algorithms for the acyclic directed steiner tree problem , 1997, Algorithmica.

[16]  Amit P. Sheth,et al.  Discovering informative connection subgraphs in multi-relational graphs , 2005, SKDD.

[17]  Yehuda Koren,et al.  Measuring and extracting proximity in networks , 2006, KDD '06.

[18]  Wei Jin,et al.  HCAMiner: Mining Concept Associations for Knowledge Discovery through Concept Chain Queries , 2007, COLING.

[19]  Kevin J. Lang,et al.  Communities from seed sets , 2006, WWW '06.

[20]  David A. Bader,et al.  Detecting Communities from Given Seeds in Social Networks , 2011 .

[21]  Vipin Kumar,et al.  Multilevel Algorithms for Multi-Constraint Graph Partitioning , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[22]  Dafna Shahaf,et al.  Connecting the dots between news articles , 2010, IJCAI.

[23]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[24]  Christos Faloutsos,et al.  GRAPHITE: A Visual Query System for Large Graphs , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[25]  W. Shrum Friendship in School: Gender and Racial Homophily. , 1988 .

[26]  Christos Faloutsos,et al.  Islands and Bridges: Making Sense of Marked Nodes in Large Graphs ? , 2013 .

[27]  Vardges Melkonian,et al.  New primal-dual algorithms for Steiner tree problems , 2007, Comput. Oper. Res..

[28]  Petteri Sevon,et al.  Subgraph Queries by Context-free Grammars , 2008, J. Integr. Bioinform..

[29]  Philip S. Yu,et al.  BASSET: Scalable Gateway Finder in Large Graphs , 2010, PAKDD.

[30]  Sudipto Guha,et al.  Approximation algorithms for directed Steiner problems , 1999, SODA '98.

[31]  A. Liekens,et al.  BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation , 2011, Genome Biology.