Reverse data management

Database research mainly focuses on forward-moving data flows: source data is subjected to transformations and evolves through queries, aggregations, and view definitions to form a new target instance, possibly with a different schema. This Forward Paradigm underpins most data management tasks today, such as querying, data integration, data mining, etc. We contrast this forward processing with Reverse Data Management (RDM), where the action needs to be performed on the input data, on behalf of desired outcomes in the output data. Some data management tasks already fall under this paradigm, for example updates through views, data generation, data cleaning and repair. RDM is, by necessity, conceptually more difficult to define, and computationally harder to achieve. Today, however, as increasingly more of the available data is derived from other data, there is an increased need to be able to modify the input in order to achieve a desired effect on the output, motivating a systematic study of RDM. We define the Reverse Data Management problem, and classify RDM problems into four categories. We illustrate known examples of RDM problems and classify them under these categories. Finally, we introduce a new type of RDM problem, How-To Queries.

[1]  Yannis Papakonstantinou,et al.  Hypothetical Queries in an OLAP Environment , 2000, VLDB.

[2]  Paolo Mancarella,et al.  Database Updates through Abduction , 1990, VLDB.

[3]  Georg Gottlob,et al.  The complexity of logic-based abduction , 1993, JACM.

[4]  Laks V. S. Lakshmanan,et al.  What-if OLAP Queries with Changing Dimensions , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[5]  Dan Suciu,et al.  The Complexity of Causality and Responsibility for Query Answers and non-Answers , 2010, Proc. VLDB Endow..

[6]  Ronald Fagin,et al.  Inverting schema mappings , 2006, TODS.

[7]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[8]  John Mylopoulos,et al.  Next generation business intelligence (BI) tools , 2010, CASCON.

[9]  Tim Menzies,et al.  Applications of abduction: hypothesis testing of neuroendocrinological qualitative compartmental models , 1997, Artif. Intell. Medicine.

[10]  Dan Suciu,et al.  Causality in Databases , 2010, IEEE Data Eng. Bull..

[11]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[12]  Umeshwar Dayal,et al.  On the correct translation of update operations on relational views , 1982, TODS.

[13]  Suman Nath,et al.  Tracing data errors with view-conditioned causality , 2011, SIGMOD '11.

[14]  Ronald Fagin,et al.  The structure of inverses in schema mappings , 2010, JACM.

[15]  Kenneth Baclawski,et al.  Quickly generating billion-record synthetic databases , 1994, SIGMOD '94.

[16]  Neil A. Ernst,et al.  Reasoning with Optional and Preferred Requirements , 2010, ER.

[17]  James Cheney,et al.  Provenance in Databases: Why, How, and Where , 2009, Found. Trends Databases.

[18]  Surajit Chaudhuri,et al.  Flexible Database Generators , 2005, VLDB.

[19]  Bert Van Nuffelen,et al.  Coherent Integration of Databases by Abductive Logic Programming , 2004, J. Artif. Intell. Res..

[20]  Leopoldo E. Bertossi,et al.  Complexity of Consistent Query Answering in Databases Under Cardinality-Based and Incremental Repair Semantics , 2006, ICDT.

[21]  Rico Wind,et al.  Simple and realistic data generation , 2006, VLDB.

[22]  Sanjeev Khanna,et al.  Edinburgh Research Explorer On the Propagation of Deletions and Annotations through Views , 2013 .

[23]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[24]  John Mylopoulos,et al.  Simple and Minimum-Cost Satisfiability for Goal Models , 2004, CAiSE.

[25]  Christos H. Papadimitriou,et al.  Updates of Relational Views , 1984, JACM.

[26]  Marcelo Arenas,et al.  Composition and inversion of schema mappings , 2009, SGMD.

[27]  Marcelo Arenas,et al.  Relational and XML Data Exchange , 2010, Relational and XML Data Exchange.