Mining Summaries for Knowledge Graph Search

Mining and searching heterogeneous and large knowledge graphs is challenging under real-world resource constraints such as response time. This paper studies a framework that discover to facilitate knowledge graph search. 1) We introduce a class of summaries characterized by graph patterns. In contrast to conventional summaries defined by frequent subgraphs, the summaries are capable of adaptively summarize entities with similar neighbors up to a bounded hop. 2) We formulate the computation of graph summarization as a bi-criteria pattern mining problem. Given a knowledge graph G, the problem is to discover k diversified summaries that maximizes the informativeness measure. Although this problem is NP-hard, we show that it is 2-approximable. We also introduce an online mining algorithm that trade-off speed and accuracy, under given resource constraints. 3) We develop query evaluation algorithms that make use of the summaries as views. These algorithms efficiently compute (approximate) answers with high accuracy, and only refer to a small number of summaries. Our experimental study verifies that online mining over large knowledge graphs is feasible, and can suggest bounded search in knowledge graphs.

[1]  Tom M. Mitchell,et al.  Random Walk Inference and Learning in A Large Scale Knowledge Base , 2011, EMNLP.

[2]  Éva Tardos,et al.  An approximation algorithm for the generalized assignment problem , 1993, Math. Program..

[3]  Vipin Kumar,et al.  Multilevel Graph Partitioning Schemes , 1995, ICPP.

[4]  Andrew Lim,et al.  D(k)-index: an adaptive structural summary for graph-structured data , 2003, SIGMOD '03.

[5]  Marcin Sydow,et al.  To Diversify or Not to Diversify Entity Summaries on RDF Knowledge Graphs? , 2011, ISMIS.

[6]  Xin Wang,et al.  Querying big graphs within bounded resources , 2014, SIGMOD Conference.

[7]  Mohammed J. Zaki,et al.  Arabesque: a system for distributed graph mining , 2015, SOSP.

[8]  Yinghui Wu,et al.  Schemaless and Structureless Graph Querying , 2014, Proc. VLDB Endow..

[9]  Dimitrios Gunopulos,et al.  Anytime Measures for Top-k Algorithms , 2007, VLDB.

[10]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[11]  Francesco Bonchi,et al.  Graph summarization with quality guarantees , 2014, 2014 IEEE International Conference on Data Mining.

[12]  Wolfgang Nejdl,et al.  Incremental diversification for very large sets: a streaming-based approach , 2011, SIGIR '11.

[13]  Feifei Li,et al.  Rewriting queries on SPARQL views , 2011, WWW.

[14]  Tianyu Wo,et al.  Capturing Topology in Graph Pattern Matching , 2011, Proc. VLDB Endow..

[15]  Gabriel Kliot,et al.  Streaming graph partitioning for large distributed graphs , 2012, KDD.

[16]  Xin Wang,et al.  Distributed Graph Simulation: Impossibility and Possibility , 2014, Proc. VLDB Endow..

[17]  Panos Kalnis,et al.  GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph , 2014, Proc. VLDB Endow..

[18]  Charu C. Aggarwal,et al.  A Survey of Clustering Algorithms for Graph Data , 2010, Managing and Mining Graph Data.

[19]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[20]  Jiawei Han,et al.  Extracting redundancy-aware top-k patterns , 2006, KDD '06.

[21]  Rajarshi Das,et al.  Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks , 2016, EACL.

[22]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[23]  Lawrence B. Holder,et al.  Subdue: compression-based frequent pattern discovery in graph data , 2005 .

[24]  Gregory Buehrer,et al.  A scalable pattern mining approach to web graph compression with communities , 2008, WSDM '08.

[25]  Amedeo Napoli,et al.  The Model of Most Informative Patterns and Its Application to Knowledge Extraction from Graph Databases , 2009, ECML/PKDD.

[26]  Yinghui Wu,et al.  Summarizing Answer Graphs Induced by Keyword Queries , 2013, Proc. VLDB Endow..

[27]  Junhu Wang,et al.  Exploiting Vertex Relationships in Speeding up Subgraph Isomorphism over Large Graphs , 2015, Proc. VLDB Endow..

[28]  Wendy Hui Wang,et al.  The Threshold Algorithm: From Middleware Systems to the Relational Engine , 2007, IEEE Transactions on Knowledge and Data Engineering.

[29]  Jignesh M. Patel,et al.  Efficient aggregation for graph summarization , 2008, SIGMOD Conference.

[30]  Danai Koutra,et al.  Summarizing and understanding large graphs , 2014, Stat. Anal. Data Min..

[31]  Andrew McCallum,et al.  Compositional Vector Space Models for Knowledge Base Completion , 2015, ACL.

[32]  Gerhard Weikum,et al.  NAGA: Searching and Ranking Knowledge , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[33]  Xin Wang,et al.  Association Rules with Graph Patterns , 2015, Proc. VLDB Endow..

[34]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[35]  Xin Wang,et al.  Answering graph pattern queries using views , 2006, 2014 IEEE 30th International Conference on Data Engineering.

[36]  Feifei Li,et al.  Scalable Keyword Search on Large RDF Data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[37]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[38]  Lawrence B. Holder,et al.  Approaches to Parallel Graph-Based Knowledge Discovery , 2001, J. Parallel Distributed Comput..

[39]  Gerhard Weikum,et al.  NAGA: harvesting, searching and ranking knowledge , 2008, SIGMOD Conference.