Mining frequent neighborhood patterns in a large labeled graph

Over the years, frequent subgraphs have been an important kind of targeted pattern in pattern mining research, where most approaches deal with databases holding a number of graph transactions, e.g., the chemical structures of compounds. These methods rely heavily on the downward-closure property (DCP) of the support measure to ensure an efficient pruning of the candidate patterns. When switching to the emerging scenario of single-graph databases such as Google's Knowledge Graph and Facebook's social graph, the traditional support measure turns out to be trivial (either 0 or 1). However, to the best of our knowledge, all attempts to redefine a single-graph support have resulted in measures that either lose DCP, or are no longer semantically intuitive. This paper targets pattern mining in the single-graph setting. We propose mining a new class of patterns called frequent neighborhood patterns, which is free from the "DCP-intuitiveness" dilemma of mining frequent subgraphs in a single graph. A neighborhood is a specific topological pattern in which a vertex is embedded, and the pattern is frequent if it is shared by a large portion (above a given threshold) of vertices. We show that the new patterns not only maintain DCP, but also have equally significant interpretations as subgraph patterns. Experiments on real-life datasets support the feasibility of our algorithms on relatively large graphs, as well as the capability of mining interesting knowledge that is not discovered by prior methods.

[1]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[2]  Anton Dries,et al.  Mining Patterns in Networks using Homomorphism , 2011, SDM.

[3]  Jie Tang,et al.  A Combination Approach to Web User Profiling , 2010, TKDD.

[4]  Panos Kalnis,et al.  GraMi: Generalized Frequent Pattern Mining in a Single Large Graph , 2011 .

[5]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[6]  Mohammed J. Zaki,et al.  Infrastructure Pattern Discovery in Configuration Management Databases via Large Sparse Graph Mining , 2011, 2011 IEEE 11th International Conference on Data Mining.

[7]  Tom M. Mitchell,et al.  Random Walk Inference and Learning in A Large Scale Knowledge Base , 2011, EMNLP.

[8]  Ruoming Jin,et al.  Topic level expertise search over heterogeneous networks , 2010, Machine Learning.

[9]  Vladimir Vacic,et al.  Graphlet Kernels for Prediction of Functional Residues in Protein Structures , 2010, J. Comput. Biol..

[10]  Christian Borgelt,et al.  Subgraph Support in a Single Large Graph , 2007 .

[11]  Juan-Zi Li,et al.  Extraction and mining of an academic social network , 2008, WWW.

[12]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[13]  Oren Etzioni,et al.  Learning First-Order Horn Clauses from Web Text , 2010, EMNLP.

[14]  George Karypis,et al.  Within-Network Classification Using Local Structure Similarity , 2009, ECML/PKDD.

[15]  Hannu Toivonen,et al.  Discovery of frequent DATALOG patterns , 1999, Data Mining and Knowledge Discovery.

[16]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2004, IEEE International Parallel and Distributed Processing Symposium.

[17]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[18]  Bart Goethals,et al.  Mining tree queries in a graph , 2005, KDD '05.

[19]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[20]  Philip S. Yu,et al.  Meta path-based collective classification in heterogeneous information networks , 2012, CIKM.

[21]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[22]  Jennifer Widom,et al.  Mining the space of graph properties , 2004, KDD.

[23]  Christian Borgelt,et al.  Support Computation for Mining Frequent Subgraphs in a Single Graph , 2007, MLG.

[24]  Fabian M. Suchanek,et al.  AMIE: association rule mining under incomplete evidence in ontological knowledge bases , 2013, WWW.

[25]  Jie Tang,et al.  Social Network Extraction of Academic Researchers , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[26]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[27]  Jan Van den Bussche,et al.  Mining for Tree-Query Associations in a Graph , 2006, Sixth International Conference on Data Mining (ICDM'06).

[28]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[29]  Siegfried Nijssen,et al.  What Is Frequent in a Single Graph? , 2007, PAKDD.

[30]  Ehud Gudes,et al.  Computing frequent graph patterns from semistructured data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[31]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[32]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..