Performance Analysis of One-to-Many Data Transformations

Relational Database Systems often support activities like data warehousing, cleaning and integration. All these activities require performing some sort of data transformations. Since data often resides on relational databases, data transformations are often specified using SQL, which is based of relational algebra. However, many useful data transformations cannot be expressed as SQL queries due to limited expressive power of relational algebra. In particular, an important class of data transformations that produces several output tuples for a single input tuple cannot be expressed in that way. In this report, we analyze alternatives to process one-to-many data transformations using Relational Database Systems, and compare them in terms of expressiveness, optimizability and performance.

[1]  Joseph M. Hellerstein,et al.  Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.

[2]  Gerhard Weikum,et al.  The LRU-K page replacement algorithm for database disk buffering , 1993, SIGMOD Conference.

[3]  Paulo Carreira,et al.  Efficient development of data migration transformations , 2004, SIGMOD '04.

[4]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[5]  Jim Melton,et al.  SQL:2003 has been published , 2004, SGMD.

[6]  Alfred V. Aho,et al.  Universality of data retrieval languages , 1979, POPL.

[7]  Ming-Chien Shan,et al.  Optimization of relational algebra expressions containing recursion operators , 1991, CSC '91.

[8]  Song Jiang,et al.  LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance , 2002, SIGMETRICS '02.

[9]  Kyuseok Shim,et al.  Query Optimization in the Presence of Foreign Functions , 1993, VLDB.

[10]  Wolfgang Effelsberg,et al.  Principles of database buffer management , 1984, TODS.

[11]  Carlos Ordonez Optimizing recursive queries in SQL , 2005, SIGMOD '05.

[12]  C. Mohan,et al.  ARIES/IM: an efficient and high concurrency index management method using write-ahead logging , 1992, SIGMOD '92.

[13]  Vincent Y. Lum,et al.  CONVERT: a high level translation definition language for data conversion , 1975, CACM.

[14]  Laura M. Haas,et al.  Transforming Heterogeneous Data with Database Middleware: Beyond Integration , 1999, IEEE Data Eng. Bull..

[15]  E. Harder,et al.  Apache , 1965 .

[16]  Krzysztof Stencel,et al.  Usable Recursive Queries , 2005, ADBIS.

[17]  Irving L. Traiger,et al.  The Recovery Manager of the System R Database Manager , 1981, CSUR.

[18]  Goetz Graefe,et al.  PIVOT and UNPIVOT: Optimization and Execution Strategies in an RDBMS , 2004, VLDB.

[19]  Vincent Y. Lum,et al.  EXPRESS: a data EXtraction, Processing, and Restructuring System , 1977, TODS.

[20]  Rafiul Ahad,et al.  RQL: A Recursive Query Language , 1993, IEEE Trans. Knowl. Data Eng..

[21]  Paulo Carreira,et al.  One-to-many transformation through data mappers , 2006 .

[22]  F. E. A Relational Model of Data Large Shared Data Banks , 2000 .

[23]  Patrick Valduriez,et al.  Evaluation of Recursive Queries Using Join Indices , 1986, Expert Database Conf..

[24]  Dennis Shasha,et al.  Declarative Data Cleaning: Language, Model, and Algorithms , 2001, VLDB.

[25]  Jan Paredaens,et al.  On the Expressive Power of the Relational Algebra , 1978, Inf. Process. Lett..

[26]  Paulo Carreira,et al.  Execution of data mappers , 2004, IQIS '04.

[27]  Erhard Rahm,et al.  Data Warehouse Scenarios for Model Management , 2000, ER.

[28]  Abraham Silberschatz,et al.  Database Systems Concepts , 1997 .

[29]  Alan R. Simon,et al.  Sql: 1999 Understanding Relational Language Components , 2002 .

[30]  Rakesh Agrawal Alpha: An extension of relational algebra to express a class of recursive queries , 1987, 1987 IEEE Third International Conference on Data Engineering.

[31]  Paulo Carreira,et al.  Extending Relational Algebra to express one-to-many data transformations , 2005, SBBD.