Contrast Subgraph Mining from Coherent Cores

Graph pattern mining methods can extract informative and useful patterns from large-scale graphs and capture underlying principles through the overwhelmed information. Contrast analysis serves as a keystone in various fields and has demonstrated its effectiveness in mining valuable information. However, it has been long overlooked in graph pattern mining. Therefore, in this paper, we introduce the concept of contrast subgraph, that is, a subset of nodes that have significantly different edges or edge weights in two given graphs of the same node set. The major challenge comes from the gap between the contrast and the informativeness. Because of the widely existing noise edges in real-world graphs, the contrast may lead to subgraphs of pure noise. To avoid such meaningless subgraphs, we leverage the similarity as the cornerstone of the contrast. Specifically, we first identify a coherent core, which is a small subset of nodes with similar edge structures in the two graphs, and then induce contrast subgraphs from the coherent cores. Moreover, we design a general family of coherence and contrast metrics and derive a polynomial-time algorithm to efficiently extract contrast subgraphs. Extensive experiments verify the necessity of introducing coherent cores as well as the effectiveness and efficiency of our algorithm. Real-world applications demonstrate the tremendous potentials of contrast subgraph mining.

[1]  Andrew V. Goldberg,et al.  Finding a Maximum Density Subgraph , 1984 .

[2]  Guimei Liu,et al.  Effective Pruning Techniques for Mining Quasi-Cliques , 2008, ECML/PKDD.

[3]  James Bailey,et al.  Mining Minimal Contrast Subgraph Patterns , 2006, SDM.

[4]  Jian Pei,et al.  Mining frequent cross-graph quasi-cliques , 2009, TKDD.

[5]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks , 2005, IJCAI.

[6]  Lei Zou,et al.  Detecting urban black holes based on human mobility data , 2015, SIGSPATIAL/GIS.

[7]  Danai Koutra,et al.  RolX: structural role extraction & mining in large graphs , 2012, KDD.

[8]  Hisao Tamaki,et al.  Greedily Finding a Dense Subgraph , 2000, J. Algorithms.

[9]  Wei Wang,et al.  LTS: Discriminative subgraph mining by learning from search history , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[10]  Jiawei Han,et al.  Mining Quality Phrases from Massive Text Corpora , 2015, SIGMOD Conference.

[11]  Ryan A. Rossi,et al.  Role Discovery in Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[12]  Philip S. Yu,et al.  Discriminative frequent subgraph mining with optimality guarantees , 2010, Stat. Anal. Data Min..

[13]  Gerhard Weikum,et al.  Interesting-phrase mining for ad-hoc text analytics , 2010, Proc. VLDB Endow..

[14]  Jiawei Han,et al.  Mining advisor-advisee relationships from research publication networks , 2010, KDD.

[15]  Subhash Khot,et al.  Ruling out PTAS for graph min-bisection, densest subgraph and bipartite clique , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[16]  Ravi Kumar,et al.  Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.

[17]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[18]  Burt L. Monroe,et al.  Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict , 2008, Political Analysis.

[19]  Atreyee Dey,et al.  Fast Mining of Interesting Phrases from Subsets of Text Corpora , 2014, EDBT.

[20]  Philip S. Yu,et al.  Discriminative frequent subgraph mining with optimality guarantees , 2010 .

[21]  Holger Hoffmann,et al.  MiMAG: mining coherent subgraphs in multi-layer graphs with edge labels , 2017, Knowledge and Information Systems.

[22]  Jian Pei,et al.  Mining cross-graph quasi-cliques in gene expression and protein interaction data , 2005, 21st International Conference on Data Engineering (ICDE'05).

[23]  Mary Beth Rosson,et al.  Community Networks: Where Offline Communities Meet Online , 2005, J. Comput. Mediat. Commun..

[24]  James B. Orlin,et al.  Max flows in O(nm) time, or better , 2013, STOC '13.

[25]  Xing Xie,et al.  An Interactive-Voting Based Map Matching Algorithm , 2010, 2010 Eleventh International Conference on Mobile Data Management.

[26]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[27]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[28]  Sebastian Michel,et al.  Top-k interesting phrase mining in ad-hoc collections using sequence pattern indexing , 2012, EDBT '12.

[29]  Hao Jiang,et al.  Integrating online and offline community through facebook , 2011, 2011 International Conference on Collaboration Technologies and Systems (CTS).

[30]  Tijl De Bie,et al.  Subjective interestingness of subgraph patterns , 2016, Machine Learning.

[31]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[32]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[33]  Shaowen Wang,et al.  GeoBurst: Real-Time Local Event Detection in Geo-Tagged Tweet Streams , 2016, SIGIR.

[34]  Jiawei Han,et al.  Automated Phrase Mining from Massive Text Corpora , 2017, IEEE Transactions on Knowledge and Data Engineering.

[35]  Ryan A. Rossi,et al.  Fast maximum clique algorithms for large graphs , 2014, WWW.