论文信息 - Mining frequent neighborhood patterns in a large labeled graph

Mining frequent neighborhood patterns in a large labeled graph

Over the years, frequent subgraphs have been an important kind of targeted pattern in pattern mining research, where most approaches deal with databases holding a number of graph transactions, e.g., the chemical structures of compounds. These methods rely heavily on the downward-closure property (DCP) of the support measure to ensure an efficient pruning of the candidate patterns. When switching to the emerging scenario of single-graph databases such as Google's Knowledge Graph and Facebook's social graph, the traditional support measure turns out to be trivial (either 0 or 1). However, to the best of our knowledge, all attempts to redefine a single-graph support have resulted in measures that either lose DCP, or are no longer semantically intuitive. This paper targets pattern mining in the single-graph setting. We propose mining a new class of patterns called frequent neighborhood patterns, which is free from the "DCP-intuitiveness" dilemma of mining frequent subgraphs in a single graph. A neighborhood is a specific topological pattern in which a vertex is embedded, and the pattern is frequent if it is shared by a large portion (above a given threshold) of vertices. We show that the new patterns not only maintain DCP, but also have equally significant interpretations as subgraph patterns. Experiments on real-life datasets support the feasibility of our algorithms on relatively large graphs, as well as the capability of mining interesting knowledge that is not discovered by prior methods.

Ji-Rong Wen | Jialong Han | Ji-Rong Wen | Jialong Han

[1] Wei Wang,et al. Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[2] Anton Dries,et al. Mining Patterns in Networks using Homomorphism , 2011, SDM.

[3] Jie Tang,et al. A Combination Approach to Web User Profiling , 2010, TKDD.

[4] Panos Kalnis,et al. GraMi: Generalized Frequent Pattern Mining in a Single Large Graph , 2011 .

[5] George Karypis,et al. An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[6] Mohammed J. Zaki,et al. Infrastructure Pattern Discovery in Configuration Management Databases via Large Sparse Graph Mining , 2011, 2011 IEEE 11th International Conference on Data Mining.

[7] Tom M. Mitchell,et al. Random Walk Inference and Learning in A Large Scale Knowledge Base , 2011, EMNLP.

[8] Ruoming Jin,et al. Topic level expertise search over heterogeneous networks , 2010, Machine Learning.

[9] Vladimir Vacic,et al. Graphlet Kernels for Prediction of Functional Residues in Protein Structures , 2010, J. Comput. Biol..

[10] Christian Borgelt,et al. Subgraph Support in a Single Large Graph , 2007 .

[11] Juan-Zi Li,et al. Extraction and mining of an academic social network , 2008, WWW.

[12] George Karypis,et al. Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[13] Oren Etzioni,et al. Learning First-Order Horn Clauses from Web Text , 2010, EMNLP.

[14] George Karypis,et al. Within-Network Classification Using Local Structure Similarity , 2009, ECML/PKDD.

[15] Hannu Toivonen,et al. Discovery of frequent DATALOG patterns , 1999, Data Mining and Knowledge Discovery.

[16] George Karypis,et al. Finding Frequent Patterns in a Large Sparse Graph* , 2004, IEEE International Parallel and Distributed Processing Symposium.

[17] Das Amrita,et al. Mining Association Rules between Sets of Items in Large Databases , 2013 .

[18] Bart Goethals,et al. Mining tree queries in a graph , 2005, KDD '05.

[19] Jiawei Han,et al. CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[20] Philip S. Yu,et al. Meta path-based collective classification in heterogeneous information networks , 2012, CIKM.

[21] Jiawei Han,et al. Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[22] Jennifer Widom,et al. Mining the space of graph properties , 2004, KDD.

[23] Christian Borgelt,et al. Support Computation for Mining Frequent Subgraphs in a Single Graph , 2007, MLG.

[24] Fabian M. Suchanek,et al. AMIE: association rule mining under incomplete evidence in ontological knowledge bases , 2013, WWW.

[25] Jie Tang,et al. Social Network Extraction of Academic Researchers , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[26] Joost N. Kok,et al. A quickstart in frequent structure mining can make a difference , 2004, KDD.

[27] Jan Van den Bussche,et al. Mining for Tree-Query Associations in a Graph , 2006, Sixth International Conference on Data Mining (ICDM'06).

[28] Takashi Washio,et al. An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[29] Siegfried Nijssen,et al. What Is Frequent in a Single Graph? , 2007, PAKDD.

[30] Ehud Gudes,et al. Computing frequent graph patterns from semistructured data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[31] Philip S. Yu,et al. PathSim , 2011, Proc. VLDB Endow..

[32] Jiawei Han,et al. gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..