Efficient and scalable data evolution with column oriented databases

Database evolution is the process of updating the schema of a database or data warehouse (schema evolution) and evolving the data to the updated schema (data evolution). It is often desired or necessitated when changes occur to the data or the query workload, the initial schema was not carefully designed, or more knowledge of the database is known and a better schema is concluded. The Wikipedia database, for example, has had more than 170 versions in the past 5 years [8]. Unfortunately, although much research has been done on the schema evolution part, data evolution has long been a prohibitively expensive process, which essentially evolves the data by executing SQL queries and re-constructing indexes. This prevents databases from being flexibly and frequently changed based on the need and forces schema designers, who cannot afford mistakes, to be highly cautious. Techniques that enable efficient data evolution will undoubtedly make life much easier. In this paper, we study the efficiency of data evolution, and discuss the techniques for data evolution on column oriented databases, which store each attribute, rather than each tuple, contiguously. We show that column oriented databases have a better potential than traditional row oriented databases for supporting data evolution, and propose a novel data-level data evolution framework on column oriented databases. Our approach, as suggested by experimental evaluations on real and synthetic data, is much more efficient than the query-level data evolution on both row and column oriented databases, which involves unnecessary access of irrelevant data, materializing intermediate results and re-constructing indexes.

[1]  ZanioloCarlo,et al.  Graceful database schema evolution , 2008, VLDB 2008.

[2]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[3]  Daniel J. Abadi,et al.  Performance tradeoffs in read-optimized databases , 2006, VLDB.

[4]  Changqing Chen,et al.  Indexing of multidimensional discrete data spaces and hybrid extensions , 2009 .

[5]  Samir Khuller,et al.  Algorithms for Data Migration with Cloning , 2004, SIAM J. Comput..

[6]  David J. DeWitt,et al.  Materialization Strategies in a Column-Oriented DBMS , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[7]  G. Antoshenkov,et al.  Byte-aligned bitmap compression , 1995, Proceedings DCC '95 Data Compression Conference.

[8]  Daniel J. Abadi,et al.  Column-stores vs. row-stores: how different are they really? , 2008, SIGMOD Conference.

[9]  Arie Shoshani,et al.  Optimizing bitmap indices with efficient compression , 2006, TODS.

[10]  Yi Chen,et al.  CODS: Evolving Data Efficiently and Scalably in Column Oriented Databases , 2010, Proc. VLDB Endow..

[11]  Enrico Franconi,et al.  A Semantic Approach for Schema Evolution and Versioning in Object-Oriented Databases , 2000, Computational Logic.

[12]  Daniel J. Abadi,et al.  Column Stores for Wide and Sparse Data , 2007, CIDR.

[13]  Philip A. Bernstein,et al.  Applying Model Management to Classical Meta Data Problems , 2003, CIDR.

[14]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[15]  Kesheng Wu,et al.  Efficient joins with compressed bitmap indexes , 2009, CIKM.

[16]  Adriane Chapman,et al.  Making database systems usable , 2007, SIGMOD '07.

[17]  Avi Silberschatz,et al.  Application of Information Technology: Dynamic Tables: An Architecture for Managing Evolving, Heterogeneous Biomedical Data in Relational Database Management Systems , 2007, J. Am. Medical Informatics Assoc..

[18]  W. Scott Spangler,et al.  SIMPLE: A Strategic Information Mining Platform for Licensing and Execution , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[19]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[20]  Roland Eckert Challenge of Design Data Exchange between heterogeneous Database Schema , 2004 .

[21]  Gultekin Özsoyoglu,et al.  Temporal and Real-Time Databases: A Survey , 1995, IEEE Trans. Knowl. Data Eng..

[22]  Ming-Chuan Wu,et al.  Encoded bitmap indexes and their use for data warehouse optimization , 2001 .

[23]  Torben Bach Pedersen,et al.  Position list word aligned hybrid: optimizing space and performance for compressed bitmaps , 2010, EDBT '10.

[24]  Cong Yu,et al.  Semantic Adaptation of Schema Mappings when Schemas Evolve , 2005, VLDB.

[25]  Junghoo Cho,et al.  On the Evolution of Wikipedia , 2007, ICWSM.

[26]  Laura M. Haas,et al.  The Clio project: managing heterogeneity , 2001, SGMD.

[27]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[28]  Arie Shoshani,et al.  On the performance of bitmap indices for high cardinality attributes , 2004, VLDB.

[29]  Carlo Curino,et al.  Automating database schema evolution in information system upgrades , 2009, HotSWUp '09.

[30]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[31]  Arie Shoshani,et al.  Using Bitmap Index for Joint Queries on Structured and Text Data , 2009, New Trends in Data Warehousing and Data Analysis.

[32]  Renée J. Miller,et al.  Mapping Adaptation under Evolving Schemas , 2003, VLDB.

[33]  Jean-Luc Hainaut,et al.  Database application evolution: A transformational approach , 2006, Data Knowl. Eng..

[34]  Zohra Bellahsene Schema Evolution in Data Warehouses , 2002, Knowledge and Information Systems.

[35]  Carlo Curino,et al.  Managing and querying transaction-time databases under schema evolution , 2008, Proc. VLDB Endow..

[36]  Carlo Curino,et al.  Graceful database schema evolution: the PRISM workbench , 2008, Proc. VLDB Endow..

[37]  Young-Gook Ra Relational Schema Evolution for Program Independency , 2004, CIT.

[38]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[39]  Samir Khuller,et al.  Improved Algorithms for Data Migration , 2006, APPROX-RANDOM.

[40]  Anthony Cleve,et al.  Co-transformations in Database Applications Evolution , 2005, GTTSE.

[41]  Yoo-Ah Kim,et al.  Data migration to minimize the total completion time , 2005, J. Algorithms.

[42]  Amela Karahasanovic,et al.  Visualizing impacts of database schema changes - A controlled experiment , 2001, Proceedings IEEE Symposia on Human-Centric Computing Languages and Environments (Cat. No.01TH8587).

[43]  Alexandra Poulovassilis,et al.  Schema Evolution in Data Warehousing Environments - A Schema Transformation-Based Approach , 2004, ER.

[44]  Hakan Ferhatosmanoglu,et al.  Enhanced bitmap indexes for large scale data management , 2009 .