One-to-many data transformations through data mappers

The optimization capabilities of RDBMSs make them attractive for executing data transformations. However, despite the fact that many useful data transformations can be expressed as relational queries, an important class of data transformations that produce several output tuples for a single input tuple cannot be expressed in that way. To overcome this limitation, we propose to extend Relational Algebra with a new operator named data mapper. In this paper, we formalize the data mapper operator and investigate some of its properties. We then propose a set of algebraic rewriting rules that enable the logical optimization of expressions with mappers and prove their correctness. Finally, we experimentally study the proposed optimizations and identify the key factors that influence the optimization gains.

[1]  Vincent Y. Lum,et al.  CONVERT: a high level translation definition language for data conversion , 1975, CACM.

[2]  Laura M. Haas,et al.  The Clio project: managing heterogeneity , 2001, SGMD.

[3]  Vincent Y. Lum,et al.  EXPRESS: a data EXtraction, Processing, and Restructuring System , 1977, TODS.

[4]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[5]  Jan Paredaens,et al.  On the Expressive Power of the Relational Algebra , 1978, Inf. Process. Lett..

[6]  Laks V. S. Lakshmanan,et al.  SchemaSQL - A Language for Interoperability in Relational Multi-Database Systems , 1996, VLDB.

[7]  Paulo Carreira,et al.  Efficient development of data migration transformations , 2004, SIGMOD '04.

[8]  Dan Suciu,et al.  An overview of semistructured data , 1998, SIGA.

[9]  Anthony Kosky,et al.  WOL: a language for database transformations and constraints , 1997, Proceedings 13th International Conference on Data Engineering.

[10]  Bernhard Seeger,et al.  XXL - A Library Approach to Supporting Efficient Implementations of Advanced Database Queries , 2001, VLDB.

[11]  Dennis Shasha,et al.  Declarative Data Cleaning: Language, Model, and Algorithms , 2001, VLDB.

[12]  F. E. A Relational Model of Data Large Shared Data Banks , 2000 .

[13]  Tova Milo,et al.  Using Schema Matching to Simplify Heterogeneous Data Translation , 1998, VLDB.

[14]  Abraham Silberschatz,et al.  Database Systems Concepts , 1997 .

[15]  Sophie Cluet,et al.  Your mediators need data conversion! , 1998, SIGMOD '98.

[16]  Bernhard Seeger,et al.  javax.XXL: a prototype for a library of query processing algorithms , 2000, SIGMOD '00.

[17]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[18]  Hans-Jörg Schek,et al.  Remarks on the algebra of non first normal form relations , 1982, PODS.

[19]  Renée J. Miller Using schematically heterogeneous structures , 1998, SIGMOD '98.

[20]  Sophie Cluet,et al.  Data Integration Based on Data Conversion and Restructuring , 1997 .

[21]  H. Galhardas,et al.  Extending the Relational Algebra with the Mapper Operator , 2005 .

[22]  Bernhard Seeger,et al.  javax.XXL: a prototype for a library of query processing algorithms , 2000, SIGMOD 2000.

[23]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[24]  Kyuseok Shim,et al.  Query Optimization in the Presence of Foreign Functions , 1993, VLDB.

[25]  Joseph M. Hellerstein,et al.  Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.

[26]  Patrick C. Fischer,et al.  Nested Relational Structures , 1986, Adv. Comput. Res..

[27]  Timos K. Sellis,et al.  Optimizing ETL processes in data warehouses , 2005, 21st International Conference on Data Engineering (ICDE'05).

[28]  Anthony C. Klug Equivalence of Relational Algebra and Relational Calculus Query Languages Having Aggregate Functions , 1982, JACM.

[29]  Alfred V. Aho,et al.  Universality of data retrieval languages , 1979, POPL.

[30]  Masatoshi Yoshikawa,et al.  ILOG: Declarative Creation and Manipulation of Object Identifiers , 1990, VLDB.

[31]  Valeria De Antonellis,et al.  Relational Database Theory , 1993 .

[32]  Dennis Shasha,et al.  AJAX: an extensible data cleaning tool , 2000, SIGMOD '00.

[33]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[34]  Joseph M. Hellerstein,et al.  Optimization techniques for queries with expensive methods , 1998, TODS.