Tiresias: the database oracle for how-to queries

How-To queries answer fundamental data analysis questions of the form: "How should the input change in order to achieve the desired output". As a Reverse Data Management problem, the evaluation of how-to queries is harder than their "forward" counterpart: hypothetical, or what-if queries. In this paper, we present Tiresias, the first system that provides support for how-to queries, allowing the definition and integrated evaluation of a large set of constrained optimization problems, specifically Mixed Integer Programming problems, on top of a relational database system. Tiresias generates the problem variables, constraints and objectives by issuing standard SQL statements, allowing for its integration with any RDBMS. The contributions of this work are the following: (a) we define how-to queries using possible world semantics, and propose the specification language TiQL (for Tiresias Query Language) based on simple extensions to standard Datalog. (b) We define translation rules that generate a Mixed Integer Program (MIP) from TiQL specifications, which can be solved using existing tools. (c) Tiresias implements powerful "data-aware" optimizations that are beyond the capabilities of modern MIP solvers, dramatically improving the system performance. (d) Finally, an extensive performance evaluation on the TPC-H dataset demonstrates the effectiveness of these optimizations, particularly highlighting the ability to apply divide-and-conquer methods to break MIP problems into smaller instances.

[1]  Ronald Fagin,et al.  The structure of inverses in schema mappings , 2010, JACM.

[2]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[3]  Jian Li,et al.  Data generation using declarative constraints , 2011, SIGMOD '11.

[4]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[5]  Sara Cohen Containment of aggregate queries , 2005, SGMD.

[6]  Dan Suciu,et al.  Reverse data management , 2011, Proc. VLDB Endow..

[7]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[8]  Dan Olteanu,et al.  From complete to incomplete information and back , 2007, SIGMOD '07.

[9]  Peter J. Haas,et al.  MCDB: a monte carlo approach to managing uncertain data , 2008, SIGMOD Conference.

[10]  Jennifer Widom,et al.  Tracing the lineage of view data in a warehousing environment , 2000, TODS.

[11]  Der-San Chen,et al.  Applied Integer Programming: Modeling and Solution , 2010 .

[12]  Carsten Binnig,et al.  Reverse Query Processing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[13]  Wolfgang Faber,et al.  The DLV system for knowledge representation and reasoning , 2002, TOCL.

[14]  John Mylopoulos,et al.  Composite Indicators for Business Intelligence , 2011, ER.

[15]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[16]  Benjamin C. Pierce,et al.  Relational lenses: a language for updatable views , 2006, PODS '06.

[17]  Daniel Deutch,et al.  Provenance for aggregate queries , 2011, PODS.

[18]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[19]  Dan Olteanu,et al.  Fast and Simple Relational Processing of Uncertain Data , 2007, 2008 IEEE 24th International Conference on Data Engineering.

[20]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.