Materializing Knowledge Bases via Trigger Graphs

Thechaseis a well-established family of algorithms used to materialize Knowledge Bases (KBs) for tasks like query answering under dependencies or data cleaning. A general problem of chase algorithms is that they might perform redundant computations. To counter this problem, we introduce the notion ofTrigger Graphs(TGs), which guide the execution of the rules avoiding redundant computations. We present the results of an extensive theoretical and empirical study that seeks to answer when and how TGs can be computed and what are the benefits of TGs when applied over real-world KBs. Our results include introducing algorithms that compute (minimal) TGs. We implemented our approach in a new engine, called GLog, and our experiments show that it can be significantly more efficient than the chase enabling us to materialize Knowledge Graphs with 17B facts in less than 40 min using a single machine with commodity hardware.

[1]  Reinhard Pichler,et al.  DEMo: Data Exchange Modeling Tool , 2009, Proc. VLDB Endow..

[2]  Boris Motik,et al.  Goal-Driven Query Answering for Existential Rules with Equality , 2017, AAAI.

[3]  Bernardo Cuenca Grau,et al.  OWL 2 Web Ontology Language: Profiles , 2009 .

[4]  Li Ma,et al.  Towards a Complete OWL Ontology Benchmark , 2006, ESWC.

[5]  Giorgio Orsi,et al.  Ontological Query Answering via Rewriting , 2011, ADBIS.

[6]  B. Motik,et al.  RDFox: A Highly-Scalable RDF Store , 2015, SEMWEB.

[7]  Jacopo Urbani,et al.  Adaptive Low-level Storage of Very Large Knowledge Graphs , 2020, WWW.

[8]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[9]  Michaël Thomazo,et al.  A Single Approach to Decide Chase Termination on Linear Existential Rules , 2018, Description Logics.

[10]  Frank van Harmelen,et al.  OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples , 2010, ESWC.

[11]  Paolo Papotti,et al.  That's All Folks! LLUNATIC Goes Open Source , 2014, Proc. VLDB Endow..

[12]  Michael Meier The backchase revisited , 2013, The VLDB Journal.

[13]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[14]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[15]  Andrea Calì,et al.  A general datalog-based framework for tractable query answering over ontologies , 2009, SEBD.

[16]  David Maier,et al.  Testing implications of data dependencies , 1979, SIGMOD '79.

[17]  Ian Horrocks,et al.  Making the most of your triple store: query answering in OWL 2 using an RL reasoner , 2013, WWW.

[18]  Marie-Laure Mugnier,et al.  On Bounded Positive Existential Rules , 2016, Description Logics.

[19]  Michael Benedikt,et al.  PDQ: Proof-driven Query Answering over Web-based Data , 2014, Proc. VLDB Endow..

[20]  Giorgio Orsi,et al.  Query Rewriting and Optimization for Ontological Databases , 2014, TODS.

[21]  Mihalis Yannakakis,et al.  Equivalences Among Relational Expressions with the Union and Difference Operators , 1980, J. ACM.

[22]  Benno Kruit,et al.  Extracting Novel Facts from Tables for Knowledge Graph Completion (Extended version) , 2019, SEMWEB.

[23]  Emir Pasalic,et al.  Design and Implementation of the LogicBlox System , 2015, SIGMOD Conference.

[24]  Alin Deutsch,et al.  Datalography: Scaling datalog graph analytics on graph processing systems , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[25]  Benno Kruit,et al.  Tab2Know: Building a Knowledge Base from Tables in Scientific Papers , 2020, SEMWEB.

[26]  Adrian Onet,et al.  The Chase Procedure and its Applications in Data Exchange , 2013, Data Exchange, Information, and Streams.

[27]  Jean-François Baget,et al.  On rules with existential variables: Walking the decidability line , 2011, Artif. Intell..

[28]  Chang Liu,et al.  Term rewriting and all that , 2000, SOEN.

[29]  Catriel Beeri,et al.  On the power of magic , 1987, J. Log. Program..

[30]  Marie-Laure Mugnier,et al.  On the k-Boundedness for Existential Rules , 2018, RuleML+RR.

[31]  Steffen Staab,et al.  Knowledge graphs , 2021, Commun. ACM.

[32]  Angela Bonifati,et al.  Functional Dependencies Unleashed for Scalable Data Exchange , 2016, SSDBM.

[33]  Andrea Calì,et al.  Taming the Infinite Chase: Query Answering under Expressive Relational Constraints , 2008, Description Logics.

[34]  Luigi Bellomarini,et al.  The Vadalog System: Datalog-based Reasoning for Knowledge Graphs , 2018, Proc. VLDB Endow..

[35]  Yanhong A. Liu,et al.  More efficient datalog queries: subsumptive tabling beats magic sets , 2011, SIGMOD '11.

[36]  Alin Deutsch,et al.  The chase revisited , 2008, PODS.

[37]  Oege de Moor,et al.  Adding magic to an optimising datalog compiler , 2008, SIGMOD Conference.

[38]  Boris Motik,et al.  Benchmarking the Chase , 2017, PODS.

[39]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[40]  Jean-François Baget,et al.  Graal: A Toolkit for Query Answering with Existential Rules , 2015, RuleML.

[41]  Andrew Zisserman,et al.  CLAROS - Collaborating on Delivering the Future of the Past , 2011, International Conference on Digital Health.

[42]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[43]  Bernd Neumayr,et al.  The VADA Architecture for Cost-Effective Data Wrangling , 2017, SIGMOD Conference.

[44]  Boris Motik,et al.  Datalog Reasoning over Compressed RDF Knowledge Bases , 2019, CIKM.

[45]  Yavor Nenov,et al.  Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF Systems , 2014, AAAI.

[46]  Jacopo Urbani,et al.  Efficient Model Construction for Horn Logic with VLog - System Description , 2018, IJCAR.

[47]  Jacopo Urbani,et al.  Column-Oriented Datalog Materialization for Large Knowledge Graphs , 2016, AAAI.

[48]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[49]  Natasha Noy,et al.  Industry-scale Knowledge Graphs: Lessons and Challenges , 2019, ACM Queue.

[50]  Renée J. Miller,et al.  The iBench Integration Metadata Generator , 2015, Proc. VLDB Endow..

[51]  Boris Motik,et al.  A Context-Aware Recommendation System for Mobile Devices , 2020, SEMWEB.

[52]  Boris Motik,et al.  Modular Materialisation of Datalog Programs , 2019, AAAI.

[53]  Krysia Broda,et al.  Neural-symbolic learning systems - foundations and applications , 2012, Perspectives in neural computing.

[54]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[55]  Angelika Kimmig,et al.  Beyond the Grounding Bottleneck: Datalog Techniques for Inference in Probabilistic Logic Programs (Technical Report) , 2019, AAAI.

[56]  Diego Calvanese,et al.  Ontop: Answering SPARQL queries over relational databases , 2016, Semantic Web.

[57]  Raghu Ramakrishnan,et al.  Review - Magic Sets and Other Strange Ways to Implement Logic Programs , 1999, ACM SIGMOD Digit. Rev..

[58]  Julien Subercaze,et al.  Inferray: fast in-memory RDF inference , 2016, Proc. VLDB Endow..

[59]  Fernando Pereira,et al.  Yedalog: Exploring Knowledge at Scale , 2015, SNAPL.