Fast Core-based Top-k Frequent Pattern Discovery in Knowledge Graphs

Knowledge graph is a way of structuring information in graph form, by representing entities as nodes and relationships between entities as edges. A knowledge graph often consists of large amount of facts in real-world which can be used in supporting many analytical tasks, e.g., exceptional facts discovery and fact check of claims. In this work, we study a core-based top-k frequent pattern discovery problem which is frequently used as a subroutine in analyzing knowledge graphs. The main challenge of the problem is search space of the candidate patterns is exponential to the combinations of the nodes and edges in the knowledge graph.To reduce the search space, we devise a novel computation framework FastPat with a suite of optimizations. First, we devise a meta-index, which can be used to avoid generating invalid candidate patterns. Second, we propose an upper bound of the frequency score (i.e., MNI) of the candidate pattern that prunes unqualified candidates earlier and prioritize the enumeration order of the patterns. Lastly, we design a join-based approach to compute the MNI of candidate pattern efficiently. We conduct extensive experimental studies in real-world datasets to verify the superiority of our proposed method over the baselines. We also demonstrate the utility of the discovered frequent patterns by a case study in COVID-19 knowledge graph.

[1]  Arijit Khan,et al.  Semantic Guided and Response Times Bounded Top-k Similarity Search over Knowledge Graphs , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[2]  Guido Moerkotte,et al.  Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[3]  Christian Borgelt,et al.  Subgraph Support in a Single Large Graph , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[4]  Martin Hofmann-Apitius,et al.  COVID-19 Knowledge Graph: a computable, multi-modal, cause-and-effect knowledge model of COVID-19 pathophysiology , 2020, bioRxiv.

[5]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[6]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[7]  Arijit Khan,et al.  Mining Top-k pairs of correlated subgraphs in a large network , 2020, Proc. VLDB Endow..

[8]  RenXuguang,et al.  Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs , 2015, VLDB 2015.

[9]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[10]  Francesco Bonchi,et al.  Graph Query Reformulation with Diversity , 2015, KDD.

[11]  Junhu Wang,et al.  Exploiting Vertex Relationships in Speeding up Subgraph Isomorphism over Large Graphs , 2015, Proc. VLDB Endow..

[12]  Wilfred Ng,et al.  Fg-index: towards verification-free query processing on graph databases , 2007, SIGMOD '07.

[13]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[15]  Yinghui Wu,et al.  Fast top-k search in knowledge graphs , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[16]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[17]  F. Cheng,et al.  Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2 , 2020, Cell Discovery.

[18]  Siegfried Nijssen,et al.  What Is Frequent in a Single Graph? , 2007, PAKDD.

[19]  Xin Luna Dong,et al.  Building a Broad Knowledge Graph for Products , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[20]  Yinghui Wu,et al.  Mining Summaries for Knowledge Graph Search , 2018, IEEE Transactions on Knowledge and Data Engineering.

[21]  Yi-Cheng Tu,et al.  Flexible and Feasible Support Measures for Mining Frequent Patterns in Large Labeled Graphs , 2017, SIGMOD Conference.

[22]  Viktor Leis,et al.  How Good Are Query Optimizers, Really? , 2015, Proc. VLDB Endow..

[23]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[24]  Xin Jin,et al.  ASAP: Fast, Approximate Graph Pattern Mining at Scale , 2018, OSDI.

[25]  Gerhard Weikum,et al.  NAGA: Searching and Ranking Knowledge , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[26]  Nick Koudas,et al.  Interactive query refinement , 2009, EDBT '09.

[27]  Sayan Ranu,et al.  A Scalable and Generic Framework to Mine Top-k Representative Subgraph Patterns , 2016, ICDM.

[28]  Jiong Yang,et al.  SPIN: mining maximal frequent subgraphs from graph databases , 2004, KDD.

[29]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[30]  Jeong-Hoon Lee,et al.  An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases , 2012, Proc. VLDB Endow..

[31]  Yinghui Wu,et al.  Adding Counting Quantifiers to Graph Patterns , 2016, SIGMOD Conference.

[32]  Lei Chen,et al.  SPARQL Rewriting: Towards Desired Results , 2020, SIGMOD Conference.

[33]  Philip S. Yu,et al.  Mining significant graph patterns by leap search , 2008, SIGMOD Conference.

[34]  Mohammad Hossein Namaki,et al.  Answering Why-Questions for Subgraph Queries in Multi-attributed Graphs , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[35]  Aidong Zhang,et al.  Predicting Protein Function by Frequent Functional Association Pattern Mining in Protein Interaction Networks , 2010, IEEE Transactions on Information Technology in Biomedicine.

[36]  Xin Wang,et al.  Diversified Top-k Graph Pattern Matching , 2013, Proc. VLDB Endow..

[37]  Panos Kalnis,et al.  GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph , 2014, Proc. VLDB Endow..

[38]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[39]  James Cheng,et al.  G-Miner: an efficient task-oriented graph mining system , 2018, EuroSys.

[40]  Kai Wang,et al.  RStream: Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine , 2018, OSDI.

[41]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[42]  Jiawei Han,et al.  On graph query optimization in large networks , 2010, Proc. VLDB Endow..

[43]  Mohammed J. Zaki,et al.  Arabesque: a system for distributed graph mining , 2015, SOSP.

[44]  Ambuj K. Singh,et al.  GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[45]  Chengkai Li,et al.  Maverick: Discovering Exceptional Facts from Knowledge Graphs , 2018, SIGMOD Conference.