On the performance of one-to-many data transformations

Relational Database Systems often support activities like data warehousing, cleaning and integration. All these activities require performing some sort of data transformations. Since data often resides on relational databases, data transformations are often specified using SQL, which is based on relational algebra. However, many useful data transformations cannot be expressed as SQL queries due to the limited expressive power of relational algebra. In particular, an important class of data transformations that produces several output tuples for a single input tuple cannot be expressed in that way. In this paper, we analyze alternatives to process one-to-many data transformations using Relational Database Management Systems, and compare them in terms of expressiveness, optimizability and performance.

[1]  Rafiul Ahad,et al.  RQL: A Recursive Query Language , 1993, IEEE Trans. Knowl. Data Eng..

[2]  Vincent Y. Lum,et al.  CONVERT: a high level translation definition language for data conversion , 1975, CACM.

[3]  Kyuseok Shim,et al.  Query Optimization in the Presence of Foreign Functions , 1993, VLDB.

[4]  Laura M. Haas,et al.  Transforming Heterogeneous Data with Database Middleware: Beyond Integration , 1999, IEEE Data Eng. Bull..

[5]  Wolfgang Effelsberg,et al.  Principles of database buffer management , 1984, TODS.

[6]  Patrick Valduriez,et al.  Evaluation of Recursive Queries Using Join Indices , 1986, Expert Database Conf..

[7]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[8]  Carlos Ordonez Optimizing recursive queries in SQL , 2005, SIGMOD '05.

[9]  Erhard Rahm,et al.  Data Warehouse Scenarios for Model Management , 2000, ER.

[10]  Paulo Carreira,et al.  Extending Relational Algebra to express one-to-many data transformations , 2005, SBBD.

[11]  Dennis Shasha,et al.  Declarative Data Cleaning: Language, Model, and Algorithms , 2001, VLDB.

[12]  Bernhard Mitschang,et al.  On parallel processing of aggregate and scalar functions in object-relational DBMS , 1998, SIGMOD '98.

[13]  Vincent Y. Lum,et al.  EXPRESS: a data EXtraction, Processing, and Restructuring System , 1977, TODS.

[14]  Alfred V. Aho,et al.  Universality of data retrieval languages , 1979, POPL.

[15]  Song Jiang,et al.  LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance , 2002, SIGMETRICS '02.

[16]  Jim Melton,et al.  SQL:2003 has been published , 2004, SGMD.

[17]  Krzysztof Stencel,et al.  Usable Recursive Queries , 2005, ADBIS.

[18]  Joseph M. Hellerstein,et al.  Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.

[19]  Goetz Graefe,et al.  PIVOT and UNPIVOT: Optimization and Execution Strategies in an RDBMS , 2004, VLDB.

[20]  Paulo Carreira,et al.  One-to-many transformation through data mappers , 2006 .

[21]  F. E. A Relational Model of Data Large Shared Data Banks , 2000 .

[22]  Jan Paredaens,et al.  On the Expressive Power of the Relational Algebra , 1978, Inf. Process. Lett..

[23]  Abraham Silberschatz,et al.  Database Systems Concepts , 1997 .

[24]  Zamil Janmohamed DB2 SQL PL , 2004 .

[25]  Paulo Carreira,et al.  Execution of data mappers , 2004, IQIS '04.

[26]  C. Mohan,et al.  ARIES/IM: an efficient and high concurrency index management method using write-ahead logging , 1992, SIGMOD '92.

[27]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[28]  Steven Feuerstein,et al.  Oracle PL/SQL Programming , 1993 .

[29]  Gerhard Weikum,et al.  The LRU-K page replacement algorithm for database disk buffering , 1993, SIGMOD Conference.

[30]  Paulo Carreira,et al.  Efficient development of data migration transformations , 2004, SIGMOD '04.

[31]  Irving L. Traiger,et al.  The Recovery Manager of the System R Database Manager , 1981, CSUR.

[32]  Raul F. Chong,et al.  DB2 SQL PL: Essential Guide for DB2 UDB on Linux, UNIX, Windows, i5/OS, and z/OS , 2004 .

[33]  Ming-Chien Shan,et al.  Optimization of relational algebra expressions containing recursion operators , 1991, CSC '91.

[34]  Rakesh Agrawal Alpha: An extension of relational algebra to express a class of recursive queries , 1987, 1987 IEEE Third International Conference on Data Engineering.

[35]  Alan R. Simon,et al.  Sql: 1999 Understanding Relational Language Components , 2002 .