Caravan: Provisioning for What-If Analysis

Problems of what-if analysis (such as hypothetical deletions, insertions, and modifications) over complex analysis queries are increasingly commonplace, e.g., in forming a business strategy or looking for causal relationships in science. Here, data analysts are typically interested only in task-specific views of the data, and they expect to be able to interactively manipulate the data in a natural and seamless way — possibly on a phone or tablet, and possibly via a spreadsheet or similar interface without having to carry the full machinery of a DBMS. The Caravan system enables what-if analysis: fast, lightweight, interactive exploration of alternative answers, within views computed over large-scale distributed data sources. Our novel approach is based on creating dedicated provisioned autonomous representations, or PARs. PARs are compiled out of the data, initial analysis queries and user-specified what-if scenarios. They allow rapid evaluation of what-if scenarios without accessing the original data or performing complex query operations. Importantly, the size of PARs is governed by the parameters of the what-if analysis and is proportional to the size of the initial query answer rather than the typically much larger source data. Consequently, many what-if analysis tasks performed through PAR evaluations can be done autonomously, on limited-resource devices. We describe our model and architecture, demonstrate preliminary performance results, and present several open implementation and optimization issues.

[1]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[2]  Anthony J. Bonner Hypothetical Datalog: Complexity and Expressibility , 1990, Theor. Comput. Sci..

[3]  Shahram Ghandeharizadeh,et al.  Heraclitus: elevating deltas to be first-class citizens in a database programming language , 1996, TODS.

[4]  Richard Hull,et al.  A framework for implementing hypothetical queries , 1997, SIGMOD '97.

[5]  Yannis Papakonstantinou,et al.  Hypothetical Queries in an OLAP Environment , 2000, VLDB.

[6]  Krithi Ramamritham,et al.  Materialized view selection and maintenance using multi-query optimization , 2000, SIGMOD '01.

[7]  Abhinav Gupta,et al.  Spreadsheets in RDBMS for OLAP , 2003, SIGMOD '03.

[8]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2005, Theor. Comput. Sci..

[9]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[10]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[11]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[12]  Val Tannen,et al.  Annotated XML: queries and provenance , 2008, PODS.

[13]  Partha Pratim Talukdar,et al.  The ORCHESTRA Collaborative Data Sharing System , 2008, SIGMOD Rec..

[14]  Laks V. S. Lakshmanan,et al.  What-if OLAP Queries with Changing Dimensions , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[15]  Bin Liu,et al.  A Spreadsheet Algebra for a Direct Data Manipulation Query Interface , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[16]  Daniel Deutch,et al.  Provenance for aggregate queries , 2011, PODS.

[17]  Todd J. Green,et al.  Recomputing Materialized Instances after Changes to Mappings and Data , 2012, 2012 IEEE 28th International Conference on Data Engineering.