Benchmarking the Chase

The chase is a family of algorithms used in a number of data management tasks, such as data exchange, answering queries under dependencies, query reformulation with constraints, and data cleaning. It is well established as a theoretical tool for understanding these tasks, and in addition a number of prototype systems have been developed. While individual chase-based systems and particular optimizations of the chase have been experimentally evaluated in the past, we provide the first comprehensive and publicly available benchmark---test infrastructure and a set of test scenarios---for evaluating chase implementations across a wide range of assumptions about the dependencies and the data. We used our benchmark to compare chase-based systems on data exchange and query answering tasks with one another, as well as with systems that can solve similar tasks developed in closely related communities. Our evaluation provided us with a number of new insights concerning the factors that impact the performance of chase implementations.

[1]  Adrian Onet,et al.  The Chase Procedure and its Applications in Data Exchange , 2013, Data Exchange, Information, and Streams.

[2]  Michael Meier The backchase revisited , 2013, The VLDB Journal.

[3]  Jean-François Baget,et al.  Graal: A Toolkit for Query Answering with Existential Rules , 2015, RuleML.

[4]  Giorgio Terracina,et al.  Efficiently Computable Datalog∃ Programs , 2012, KR.

[5]  Carsten Lutz,et al.  Conjunctive Query Answering in the Description Logic EL Using a Relational Database System , 2009, IJCAI.

[6]  Georg Gottlob,et al.  Efficient core computation in data exchange , 2008, JACM.

[7]  Diego Calvanese,et al.  The Description Logic Handbook , 2007 .

[8]  Michael Benedikt,et al.  PDQ: Proof-driven Query Answering over Web-based Data , 2014, Proc. VLDB Endow..

[9]  Albert Rubio,et al.  Paramodulation-Based Theorem Proving , 2001, Handbook of Automated Reasoning.

[10]  Paolo Papotti,et al.  That's All Folks! LLUNATIC Goes Open Source , 2014, Proc. VLDB Endow..

[11]  Alin Deutsch,et al.  Query reformulation with constraints , 2006, SGMD.

[12]  Bruno Marnette,et al.  Generalized schema-mappings: from termination to tractability , 2009, PODS.

[13]  Chang Liu,et al.  Term rewriting and all that , 2000, SOEN.

[14]  Denilson Barbosa,et al.  ToXgene: a template-based data generator for XML , 2002, SIGMOD '02.

[15]  Paolo Papotti,et al.  Mapping and cleaning , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[16]  Paolo Papotti,et al.  Core schema mappings: Scalable core computations in data exchange , 2012, Inf. Syst..

[17]  Phokion G. Kolaitis,et al.  Laconic Schema Mappings: Computing the Core with SQL Queries , 2009, Proc. VLDB Endow..

[18]  Carsten Lutz,et al.  The Combined Approach to Ontology-Based Data Access , 2011, IJCAI.

[19]  Stephan Schulz,et al.  System Description: E 1.8 , 2013, LPAR.

[20]  José Luis Ambite,et al.  Optimizing the Chase: Scalable Data Integration under Constraints , 2014, Proc. VLDB Endow..

[21]  Angela Bonifati,et al.  Functional Dependencies Unleashed for Scalable Data Exchange , 2016, SSDBM.

[22]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[23]  Yavor Nenov,et al.  Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF Systems , 2014, AAAI.

[24]  Paolo Papotti,et al.  Scalable data exchange with functional dependencies , 2010, Proc. VLDB Endow..

[25]  Wang Chiew Tan,et al.  STBenchmark: towards a benchmark for mapping systems , 2008, Proc. VLDB Endow..

[26]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[27]  Phokion G. Kolaitis,et al.  Laconic schema mappings: computing core universal solutions by means of SQL queries , 2009, ArXiv.

[28]  Reinhard Pichler,et al.  DEMo: Data Exchange Modeling Tool , 2009, Proc. VLDB Endow..

[29]  I. V. Ramakrishnan,et al.  Term Indexing , 1995, Lecture Notes in Computer Science.

[30]  Boris Motik,et al.  Acyclicity Notions for Existential Rules and Their Application to Query Answering in Ontologies , 2013, J. Artif. Intell. Res..

[31]  Wolfgang Faber,et al.  The DLV system for knowledge representation and reasoning , 2002, TOCL.

[32]  Alin Deutsch,et al.  The chase revisited , 2008, PODS.

[33]  Renée J. Miller,et al.  The iBench Integration Metadata Generator , 2015, Proc. VLDB Endow..

[34]  Michael Benedikt,et al.  Querying with Access Patterns and Integrity Constraints , 2015, Proc. VLDB Endow..

[35]  David Maier,et al.  Testing implications of data dependencies , 1979, SIGMOD '79.

[36]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[37]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[38]  Francesco Scarcello,et al.  Enhancing DLV instantiator by backjumping techniques , 2007, Annals of Mathematics and Artificial Intelligence.

[39]  Catriel Beeri,et al.  On the power of magic , 1987, J. Log. Program..

[40]  Foto N. Afrati,et al.  Computing certain answers in the presence of dependencies , 2010, Inf. Syst..