Impact Analysis and Policy-Conforming Rewriting of Evolving Data-Intensive Ecosystems

Data-intensive ecosystems are conglomerations of data repositories surrounded by applications that depend on them for their operation. In this paper, we address the problem of performing what-if analysis for the evolution of the database part of a data-intensive ecosystem, to identify what other parts of an ecosystem are affected by a potential change in the database schema, and how will the ecosystem look like once the change has been performed, while, at the same time, retaining the ability to regulate the flow of events. We model the ecosystem as a graph, uniformly covering relations, views, and queries as nodes and their internal structure and interdependencies as the edges of the graph. We provide a simple language to annotate the modules of the graph with policies for their response to evolutionary events to regulate the flow of events and their impact by (i) vetoing (“blocking”) the change in parts that the developers want to retain unaffected and (ii) allowing (“propagating”) the change in parts that we need to adapt to the new schema. Our method for the automatic adaptation of ecosystems is based on three algorithms that automatically (i) assess the impact of a change, (ii) compute the need of different variants of an ecosystem’s components, depending on policy conflicts, and (iii) rewrite the modules to adapt to the change. We theoretically prove the coverage of the language, as well as the termination, consistency, and confluence of our algorithms and experimentally verify our methods effectiveness and efficiency.

[1]  Gunter Saake,et al.  A Layered Architecture for Enterprise Data Warehouse Systems , 2012, CAiSE Workshops.

[2]  Erhard Rahm,et al.  Recent Advances in Schema and Ontology Evolution , 2011, Schema Matching and Mapping.

[3]  Eleni Stroulia,et al.  Analyzing the evolutionary history of the logical design of object-oriented software , 2005, IEEE Transactions on Software Engineering.

[4]  Kenneth A. Ross,et al.  Adapting materialized views after redefinitions: techniques and a performance study , 2001, Inf. Syst..

[5]  Renée J. Miller,et al.  Preserving mapping consistency under schema changes , 2004, The VLDB Journal.

[6]  Robert Wrembel,et al.  Metadata Management in a Multiversion Data Warehouse , 2005, OTM Conferences.

[7]  Alexandra Poulovassilis,et al.  Data integration by bi-directional schema transformation rules , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[8]  A. Maule,et al.  Impact analysis of database schema changes , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[9]  Carlo Curino,et al.  Update Rewriting and Integrity Constraint Maintenance in a Schema Evolution Support System: PRISM++ , 2010, Proc. VLDB Endow..

[10]  Roger S. Pressman,et al.  Software Engineering: A Practitioner's Approach , 1982 .

[11]  Carlo Curino,et al.  How Clean Is Your Sandbox? - Towards a Unified Theoretical Framework for Incremental Bidirectional Transformations , 2012, ICMT@TOOLS.

[12]  George Papastefanatos,et al.  Design Metrics for Data Warehouse Evolution , 2008, ER.

[13]  Mark Weiser,et al.  Program Slicing , 1981, IEEE Transactions on Software Engineering.

[14]  Benjamin C. Pierce,et al.  Combinators for bi-directional tree transformations: a linguistic approach to the view update problem , 2005, POPL '05.

[15]  George Papastefanatos,et al.  Automating the Adaptation of Evolving Data-Intensive Ecosystems , 2013, ER.

[16]  S. Ram,et al.  Research Issues in Database Schema Evolution: the Road Not Taken , 2003 .

[17]  Elke A. Rundensteiner,et al.  The CVS Algorithm for View Synchronization in Evolvable Large-Scale Information Systems , 1998, EDBT.

[18]  John F. Roddick,et al.  Schema evolution in database systems: an annotated bibliography , 1992, SGMD.

[19]  Anthony Cleve,et al.  A Conceptual Approach to Database Applications Evolution , 2010, ER.

[20]  Gottfried Vossen,et al.  Schema versioning in data warehouses: Enabling cross-version querying via schema augmentation , 2006, Data Knowl. Eng..

[21]  Y. Vassiliou,et al.  Hecataeus: A Framework for Representing SQL Constructs as Graphs , 2005, EMMSAD.

[22]  George Papastefanatos,et al.  Policy-Regulated Management of ETL Evolution , 2009, J. Data Semant..

[23]  George Papastefanatos,et al.  Propagating evolution events in data-centric software artifacts , 2011, 2011 IEEE 27th International Conference on Data Engineering Workshops.

[24]  Scott Britell,et al.  Updatable and evolvable transforms for virtual databases , 2010, Proc. VLDB Endow..

[25]  George Papastefanatos,et al.  HECATAEUS: Regulating schema evolution , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[26]  George Papastefanatos,et al.  Language Extensions for the Automation of Database Schema Evolution , 2008, ICEIS.

[27]  G. G. Meyer,et al.  Lecture notes in business information processing , 2009 .

[28]  Carlo Curino,et al.  Automating the database schema evolution process , 2012, The VLDB Journal.