Synthesizing database programs for schema refactoring

Many programs that interact with a database need to undergo schema refactoring several times during their life cycle. Since this process typically requires making significant changes to the program's implementation, schema refactoring is often non-trivial and error-prone. Motivated by this problem, we propose a new technique for automatically synthesizing a new version of a database program given its original version and the source and target schemas. Our method does not require manual user guidance and ensures that the synthesized program is equivalent to the original one. Furthermore, our method is quite efficient and can synthesize new versions of database programs (containing up to 263 functions) that are extracted from real-world web applications with an average synthesis time of 69.4 seconds.

[1]  Stephane Faroult,et al.  Refactoring SQL applications , 2008 .

[2]  NAVID YAGHMAZADEH,et al.  SQLizer: query synthesis from natural language , 2017, Proc. ACM Program. Lang..

[3]  Hans Jürgen Prömel,et al.  The Steiner Tree Problem , 2002 .

[4]  Alvin Cheung,et al.  Speeding up symbolic reasoning for relational queries , 2018, Proc. ACM Program. Lang..

[5]  Alvin Cheung,et al.  Synthesizing highly expressive SQL queries from input-output examples , 2017, PLDI.

[6]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[7]  Scott W. Ambler,et al.  Test-Driven Development of Relational Databases , 2007, IEEE Software.

[8]  Alin Deutsch,et al.  Automatic Verification of Database-Centric Systems , 2014, SIGMOD Rec..

[9]  Alvin Cheung,et al.  Axiomatic Foundations and Algorithms for Deciding Semantic Equivalences of SQL Queries , 2018, Proc. VLDB Endow..

[10]  Fan Long,et al.  Staged program repair with condition synthesis , 2015, ESEC/SIGSOFT FSE.

[11]  Rajeev Alur,et al.  Syntax-guided synthesis , 2013, 2013 Formal Methods in Computer-Aided Design.

[12]  Alvin Cheung,et al.  Optimizing database-backed applications with query synthesis , 2013, ACM-SIGPLAN Symposium on Programming Language Design and Implementation.

[13]  Joseph P. Near,et al.  Rubicon: bounded verification of web applications , 2012, SIGSOFT FSE.

[14]  F. Hwang,et al.  The Steiner Tree Problem , 2012 .

[15]  H. V. Jagadish,et al.  NaLIR: an interactive natural language interface for querying relational databases , 2014, SIGMOD Conference.

[16]  Aws Albarghouthi,et al.  Syntax-guided synthesis of Datalog programs , 2018, ESEC/SIGSOFT FSE.

[17]  Scott J. Ambler,et al.  Refactoring Databases: Evolutionary Database Design , 2006 .

[18]  Noam Rinetzky,et al.  Verifying Equivalence of Spark Programs , 2017, CAV.

[19]  Sebastian Nowozin,et al.  DeepCoder: Learning to Write Programs , 2016, ICLR.

[20]  Alin Deutsch,et al.  A chase too far? , 2000, SIGMOD '00.

[21]  Alin Deutsch,et al.  A verifier for interactive, data-driven web applications , 2005, SIGMOD '05.

[22]  Isil Dillig,et al.  Verifying equivalence of database-driven applications , 2017, Proc. ACM Program. Lang..

[23]  Armando Solar-Lezama,et al.  The Sketching Approach to Program Synthesis , 2009, APLAS.

[24]  Sumit Gulwani,et al.  Automating string processing in spreadsheets using input-output examples , 2011, POPL '11.

[25]  Aws Albarghouthi,et al.  Constraint-Based Synthesis of Datalog Programs , 2017, CP.

[26]  Isil Dillig,et al.  Program synthesis using conflict-driven learning , 2017, PLDI.

[27]  Armando Solar-Lezama,et al.  Program synthesis by sketching , 2008 .

[28]  Sanjit A. Seshia,et al.  Combinatorial sketching for finite programs , 2006, ASPLOS XII.

[29]  Carlo Curino,et al.  Update Rewriting and Integrity Constraint Maintenance in a Schema Evolution Support System: PRISM++ , 2010, Proc. VLDB Endow..

[30]  Genny Tortora,et al.  Synchronization of Queries and Views Upon Schema Evolutions , 2016, ACM Trans. Database Syst..

[31]  Joost Visser Coupled Transformation of Schemas, Documents, Queries, and Constraints , 2008, Electron. Notes Theor. Comput. Sci..

[32]  Alexander Aiken,et al.  Data representation synthesis , 2011, PLDI '11.

[33]  Frank Tip,et al.  Finding Bugs in Web Applications Using Dynamic Test Generation and Explicit-State Model Checking , 2010, IEEE Transactions on Software Engineering.

[34]  Dong Qiu,et al.  An empirical analysis of the co-evolution of schema and code in database applications , 2013, ESEC/FSE 2013.

[35]  Alvin Cheung,et al.  HoTTSQL: proving query rewrites with univalent SQL semantics , 2016, PLDI.

[36]  Helmut Veith,et al.  On the automated verification of web applications with embedded SQL , 2016, ICDT.

[37]  Premkumar T. Devanbu,et al.  Static checking of dynamically generated queries in database applications , 2004, Proceedings. 26th International Conference on Software Engineering.

[38]  Alexander Aiken,et al.  Stochastic superoptimization , 2012, ASPLOS '13.

[39]  Rupak Majumdar,et al.  Dynamic test input generation for database applications , 2007, ISSTA '07.

[40]  Isil Dillig,et al.  Program synthesis using abstraction refinement , 2017, Proc. ACM Program. Lang..

[41]  Adam Chlipala,et al.  Fiat , 2015, POPL.

[42]  Michael D. Ernst,et al.  Fast synthesis of fast collections , 2016, PLDI.

[43]  Sumit Gulwani,et al.  FlashMeta: a framework for inductive program synthesis , 2015, OOPSLA.

[44]  Alvin Cheung,et al.  Cosette: An Automated Prover for SQL , 2017, CIDR.

[45]  Armando Solar-Lezama,et al.  Program synthesis from polymorphic refinement types , 2015, PLDI.

[46]  Isil Dillig,et al.  Relational program synthesis , 2018, Proc. ACM Program. Lang..

[47]  Isil Dillig,et al.  Component-based synthesis of table consolidation and transformation tasks from examples , 2016, PLDI.

[48]  Rupak Majumdar,et al.  Model Checking Database Applications , 2013, TACAS.

[49]  Alin Deutsch,et al.  VERIFAS: A Practical Verifier for Artifact Systems , 2017, Proc. VLDB Endow..

[50]  Hiroshi Inamura,et al.  Dynamic test input generation for web applications , 2008, ISSTA '08.

[51]  Carlo Curino,et al.  Automating the database schema evolution process , 2012, The VLDB Journal.

[52]  Alin Deutsch,et al.  MARS: A System for Publishing XML from Mixed and Redundant Storage , 2003, VLDB.

[53]  Emina Torlak,et al.  Synthesizing memory models from framework sketches and Litmus tests , 2017, PLDI 2017.

[54]  Alexander Aiken,et al.  Concurrent data representation synthesis , 2012, PLDI.

[55]  Rajeev Alur,et al.  Accelerating search-based program synthesis using learned probabilistic models , 2018, PLDI.

[56]  Phyllis G. Frankl,et al.  A framework for testing database applications , 2000, ISSTA '00.