Generalizing Design of Support Measures for Counting Frequent Patterns in Graphs

Frequent subgraph mining (FSM) from graphs is an active subject in computer science research. One major challenge in FSM is the development of support measures, which are basically functions that map a pattern to its frequency count in a database. Current state-of-the-art in this topic features a hypergraph-based framework for modeling pattern occurrences which unifies the two main flavors of support measures: the overlap-graph based maximum independent set measure (MIS) and minimum image/instance based (MNI) measures. For the purpose of exploring the middle ground between these two groups and guiding the development of new support measures, we present general sufficient conditions for designing new support measures in hypergraph framework, which can be applied to MNI and other support measures that are not included in the overlap graph framework. We utilize the sufficient conditions to generalize MNI and minimum-instance measure (MI) for designing user-defined linear-time measures. Furthermore, we show that a maximum independent subedge set (MISS) measure developed from the sufficient conditions can fill the gap between MIS and MI in computation complexity and support count.

[1]  Jan Ramon,et al.  An Efficiently Computable Support Measure for Frequent Subgraph Pattern Mining , 2012, ECML/PKDD.

[2]  José Eladio Medina-Pagola,et al.  Frequent approximate subgraphs as features for graph-based image classification , 2012, Knowl. Based Syst..

[3]  Oded Schwartz,et al.  On the complexity of approximating k-set packing , 2006, computational complexity.

[4]  Jiawei Han,et al.  gApprox: Mining Frequent Approximate Patterns from a Massive Network , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[5]  Christian Borgelt,et al.  Support Computation for Mining Frequent Subgraphs in a Single Graph , 2007, MLG.

[6]  Yi-Cheng Tu,et al.  Flexible and Feasible Support Measures for Mining Frequent Patterns in Large Labeled Graphs , 2017, SIGMOD Conference.

[7]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[8]  Silvio Micali,et al.  An O(v|v| c |E|) algoithm for finding maximum matching in general graphs , 1980, 21st Annual Symposium on Foundations of Computer Science (sfcs 1980).

[9]  Toon Calders,et al.  Anti-monotonic Overlap-Graph Support Measures , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[10]  T. M. Murali,et al.  Reverse Engineering Molecular Hypergraphs , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Mohammed J. Zaki,et al.  A distributed approach for graph mining in massive networks , 2016, Data Mining and Knowledge Discovery.

[12]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[13]  Meng Jinghan,et al.  Generalizing Design of Support Measures for Counting Frequent Patterns in Graphs , 2019 .

[14]  Mohammed J. Zaki,et al.  Approximate graph mining with label costs , 2013, KDD.

[15]  R. Karp,et al.  Conserved pathways within bacteria and yeast as revealed by global protein network alignment , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Siegfried Nijssen,et al.  What Is Frequent in a Single Graph? , 2007, PAKDD.

[17]  Ehud Gudes,et al.  Computing frequent graph patterns from semistructured data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[18]  Frans Coenen,et al.  A survey of frequent subgraph mining algorithms , 2012, The Knowledge Engineering Review.

[19]  Panos Kalnis,et al.  GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph , 2014, Proc. VLDB Endow..

[20]  Ehud Gudes,et al.  Support measures for graph data* , 2006, Data Mining and Knowledge Discovery.

[21]  Fabian M. Suchanek,et al.  AMIE: association rule mining under incomplete evidence in ontological knowledge bases , 2013, WWW.

[22]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, Comb..

[23]  Lorenz T. Biegler,et al.  On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming , 2006, Math. Program..

[24]  Ji-Rong Wen,et al.  Mining frequent neighborhood patterns in a large labeled graph , 2013, CIKM.