Analyzing data-centric applications: Why, what-if, and how-to

We consider in this paper the analysis of complex applications that query and update an underlying database in their operation. We focus on three classes of analytical questions that are important for application owners and users alike: Why was a result generated? What would be the result if the application logic or database is modified in a particular way? How can one interact with the application to achieve a particular goal? Answering these questions efficiently is a fundamental step towards optimizing the application and its use. Noting that provenance was a key component in answering similar questions in the context of database queries, we develop a provenance-based model and efficient algorithms for these problems in the context of data-centric applications. Novel challenges here include the dynamic update of data, combined with the possibly complex workflows allowed by applications. We nevertheless achieve theoretical guarantees for the algorithms performance, and experimentally show their efficiency and usefulness, even in presence of complex applications and large-scale data.

[1]  James Cheney,et al.  Recording Provenance for SQL Queries and Updates , 2007, IEEE Data Eng. Bull..

[2]  Aari,et al.  NSIDC Meteorological Data from the Russian Arctic, 1961-2000 , 2007 .

[3]  Melanie Herschel,et al.  EFQ: Why-Not Answer Polynomials in Action , 2015, Proc. VLDB Endow..

[4]  Frank Neven,et al.  Relational transducers for declarative networking , 2010, JACM.

[5]  James Cheney,et al.  Database Queries that Explain their Work , 2014, PPDP '14.

[6]  James Cheney,et al.  Dynamic Provenance for SPARQL Updates , 2014, International Semantic Web Conference.

[7]  Serge Abiteboul,et al.  Collaborative data-driven workflows: think global, act local , 2013, PODS '13.

[8]  Dan Olteanu,et al.  Aggregation in Probabilistic Databases via Knowledge Compilation , 2012, Proc. VLDB Endow..

[9]  Serge Abiteboul,et al.  A rule-based language for web data management , 2011, PODS.

[10]  James Cheney,et al.  Provenance in Databases: Why, How, and Where , 2009, Found. Trends Databases.

[11]  Alin Deutsch,et al.  Automatic Verification of Database-Centric Systems , 2014, SIGMOD Rec..

[12]  Marta Mattoso,et al.  SGProv: Summarization Mechanism for Multiple Provenance Graphs , 2014, J. Inf. Data Manag..

[13]  Camélia Constantin,et al.  Provenance-Based Quality Assessment and Inference in Data-Centric Workflow Executions , 2014, OTM Conferences.

[14]  Daniel Deutch,et al.  Caravan: Provisioning for What-If Analysis , 2013, CIDR.

[15]  Daniel Deutch,et al.  Putting Lipstick on Pig: Enabling Database-style Workflow Provenance , 2011, Proc. VLDB Endow..

[16]  Ioana Manolescu,et al.  EdiFlow: Data-intensive interactive workflows for visual analytics , 2010, 2011 IEEE 27th International Conference on Data Engineering.

[17]  Ion Stoica,et al.  Declarative networking: language, execution and optimization , 2006, SIGMOD Conference.

[18]  Michael Stonebraker,et al.  SubZero: A fine-grained lineage system for scientific databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[19]  James Cheney,et al.  Functional programs that explain their work , 2012, ICFP.

[20]  Dan Suciu,et al.  The Complexity of Causality and Responsibility for Query Answers and non-Answers , 2010, Proc. VLDB Endow..

[21]  Daniel Deutch,et al.  Provenance-based analysis of data-centric processes , 2015, The VLDB Journal.

[22]  Diego Calvanese,et al.  State-Boundedness in Data-Aware Dynamic Systems , 2014, KR.

[23]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[24]  Richard Hull,et al.  Business Artifacts: A Data-centric Approach to Modeling Business Operations and Processes , 2009, IEEE Data Eng. Bull..

[25]  Norman W. Paton,et al.  Fine-grained and efficient lineage querying of collection-based workflow provenance , 2010, EDBT '10.

[26]  Jennifer Widom,et al.  Databases with uncertainty and lineage , 2008, The VLDB Journal.

[27]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[28]  Dan Suciu,et al.  Tiresias: the database oracle for how-to queries , 2012, SIGMOD Conference.

[29]  Dan Suciu,et al.  Reverse data management , 2011, Proc. VLDB Endow..

[30]  Dan Suciu,et al.  A formal approach to finding explanations for database queries , 2014, SIGMOD Conference.

[31]  James Cheney,et al.  On the expressiveness of implicit provenance in query and update languages , 2008, TODS.