RiTE: Providing On-Demand Data for Right-Time Data Warehousing

Data warehouses (DWs) have traditionally been loaded with data at regular time intervals, e.g., monthly, weekly, or daily, using fast bulk loading techniques. Recently, the trend is to insert all (or only some) new source data very quickly into DWs, called near-realtime DWs (right-time DWs). This is done using regular INSERT statements, resulting in too low insert speeds. There is thus a great need for a solution that makes inserted data available quickly, while still providing bulk-load insert speeds. This paper presents RiTE ("Right-Time ETL"), a middleware system that provides exactly that. A data producer (ETL) can insert data that becomes available to data consumers on demand. RiTE includes an innovative main-memory based catalyst that provides fast storage and offers concurrency control. A number of policies controlling the bulk movement of data based on user requirements for persistency, availability, freshness, etc. are supported. The system works transparently to both producer and consumers. The system is integrated with an open source DBMS, and experiments show that it provides "the best of both worlds", i.e., INSERT-like data availability, but with bulk-load speeds (up to 10 times faster).

[1]  Bruce G. Lindsay,et al.  Database Snapshots , 1980, VLDB.

[2]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[3]  Inderpal Singh Mumick,et al.  Maintenance of Materialized Views: Problems, Techniques, and Applications , 1999, IEEE Data Eng. Bull..

[4]  Surajit Chaudhuri,et al.  Maintenance of Materialized Views: Problems, Techniques, and Applications. , 1995 .

[5]  Yue Zhuge,et al.  The Strobe algorithms for multi-source warehouse consistency , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[6]  Kenneth A. Ross,et al.  Supporting multiple view maintenance policies , 1997, SIGMOD '97.

[7]  Mary Roth,et al.  Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources , 1997, VLDB.

[8]  Patrick Valduriez,et al.  Principles of distributed database systems (2nd ed.) , 1999 .

[9]  Bruce G. Lindsay,et al.  How to roll a join: asynchronous incremental view maintenance , 2000, SIGMOD '00.

[10]  Sharma Chakravarthy,et al.  A heuristic for refresh policy selection in heterogeneous environments , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[11]  Raghu Ramakrishnan,et al.  Caching with 'Good Enough' Currency, Consistency, and Completeness , 2005, VLDB.

[12]  Abhinav Gupta,et al.  Optimizing Refresh of a Set of Materialized Views , 2005, VLDB.

[13]  Jeffrey F. Naughton,et al.  Transaction Reordering and Grouping for Continuous Data Loading , 2006, BIRTE.