Automated Migration of Hierarchical Data to Relational Tables using Programming-by-Example

While many applications export data in hierarchical formats like XML and JSON, it is often necessary to convert such hierarchical documents to a relational representation. This paper presents a novel programming-by-example approach, and its implementation in a tool called Mitra, for automatically migrating tree-structured documents to relational tables. We have evaluated the proposed technique using two sets of experiments. In the first experiment, we used Mitra to automate 98 data transformation tasks collected from StackOverflow. Our method can generate the desired program for 94% of these benchmarks with an average synthesis time of 3.8 seconds. In the second experiment, we used Mitra to generate programs that can convert real-world XML and JSON datasets to full-fledged relational databases. Our evaluation shows that Mitra can automate the desired transformation for all datasets.

[1]  Jeffrey F. Naughton,et al.  A general technique for querying XML documents using a relational database system , 2001, SGMD.

[2]  Sumit Gulwani,et al.  FIDEX: filtering spreadsheet data using examples , 2016, OOPSLA.

[3]  Shiyong Lu,et al.  Efficient schema-based XML-to-Relational data mapping , 2007, Inf. Syst..

[4]  Wang Chiew Tan,et al.  Data Integration and Data Exchange: It's Really About Time , 2013, CIDR.

[5]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[6]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[7]  Sumit Gulwani,et al.  Automating string processing in spreadsheets using input-output examples , 2011, POPL '11.

[8]  Zhonghang Xia,et al.  X2R: a system for managing XML documents and key constraints using RDBMS , 2007, ACM-SE 45.

[9]  Alvin Cheung,et al.  Synthesizing highly expressive SQL queries from input-output examples , 2017, PLDI.

[10]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[11]  Laura M. Haas,et al.  Clio: Schema Mapping Creation and Data Exchange , 2009, Conceptual Modeling: Foundations and Applications.

[12]  Isil Dillig,et al.  Synthesizing data structure transformations from input-output examples , 2015, PLDI.

[13]  Isil Dillig,et al.  Program synthesis using abstraction refinement , 2017, Proc. ACM Program. Lang..

[14]  Isil Dillig,et al.  Component-based synthesis of table consolidation and transformation tasks from examples , 2016, PLDI.

[15]  Phokion G. Kolaitis,et al.  EIRENE: Interactive Design and Refinement of Schema Mappings via Data Examples , 2011, Proc. VLDB Endow..

[16]  Rajasekar Krishnamurthy,et al.  Xml-to-sql query translation , 2004 .

[17]  Sumit Gulwani,et al.  FlashExtract: a framework for data extraction by examples , 2014, PLDI.

[18]  Toshiyuki Amagasa,et al.  A Mapping Scheme of XML Documents into Relational Databases using Schema-based Path Identi.ers , 2005, International Workshop on Challenges in Web Information Retrieval and Integration.

[19]  M. W. Shields An Introduction to Automata Theory , 1988 .

[20]  Phokion G. Kolaitis,et al.  Designing and refining schema mappings via data examples , 2011, SIGMOD '11.

[21]  Isil Dillig,et al.  Synthesizing transformations on hierarchically structured data , 2016, PLDI.

[22]  Sumit Gulwani,et al.  Spreadsheet table transformations from examples , 2011, PLDI '11.

[23]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[24]  Jeffrey F. Naughton,et al.  On schema matching with opaque column names and data values , 2003, SIGMOD '03.

[25]  Hongjun Lu,et al.  XParent: an efficient RDBMS-Based XML database system , 2002, Proceedings 18th International Conference on Data Engineering.

[26]  H. V. Jagadish,et al.  Foofah: Transforming Data By Example , 2017, SIGMOD Conference.

[27]  Philip A. Bernstein,et al.  HAMSTER: Using Search Clicklogs for Schema and Taxonomy Matching , 2009, Proc. VLDB Endow..

[28]  Rishabh Singh,et al.  BlinkFill: Semi-supervised Programming By Example for Syntactic String Transformations , 2016, Proc. VLDB Endow..

[29]  Toshiyuki Amagasa,et al.  XRel: a path-based approach to storage and retrieval of XML documents using relational databases , 2001, ACM Trans. Internet Techn..

[30]  Ahmed K. Elmagarmid,et al.  Usage-Based Schema Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[31]  David E. Shaw,et al.  Inferring LISP Programs From Examples , 1975, IJCAI.

[32]  Isil Dillig,et al.  Synthesis of data completion scripts using finite tree automata , 2017, Proc. ACM Program. Lang..

[33]  Aws Albarghouthi,et al.  MapReduce program synthesis , 2016, PLDI.

[34]  Sumit Gulwani,et al.  Synthesis from Examples: Interaction Models and Algorithms , 2012, 2012 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.

[35]  Masoud Rahgozar,et al.  A Clustering-based Scheme for Labeling XML Trees , 2006 .

[36]  WangXinyu,et al.  Automated migration of hierarchical data to relational tables using programming-by-example , 2018, VLDB 2018.

[37]  Li Qian,et al.  Sample-driven schema mapping , 2012, SIGMOD Conference.

[38]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[39]  E. McCluskey Minimization of Boolean functions , 1956 .

[40]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[41]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[42]  Sihem Amer-Yahia,et al.  A comprehensive solution to the XML-to-relational mapping problem , 2004, WIDM '04.

[43]  Sebastian Nowozin,et al.  DeepCoder: Learning to Write Programs , 2016, ICLR.

[44]  Jeffrey D. Ullman,et al.  Introduction to automata theory, languages, and computation, 2nd edition , 2001, SIGA.

[45]  Laura M. Haas,et al.  Data-driven understanding and refinement of schema mappings , 2001, SIGMOD '01.

[46]  Willard Van Orman Quine,et al.  The Problem of Simplifying Truth Functions , 1952 .

[47]  Joan Lu,et al.  Schemaless approach of mapping XML document into Relational Database , 2008, 2008 8th IEEE International Conference on Computer and Information Technology.

[48]  Renée J. Miller,et al.  Muse: Mapping Understanding and deSign by Example , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[49]  Sai Zhang,et al.  Automatically synthesizing SQL queries from input-output examples , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[50]  Phokion G. Kolaitis Schema mappings, data exchange, and metadata management , 2005, PODS '05.