Partitioning techniques for fine-grained indexing

Many data-intensive websites use databases that grow much faster than the rate that users access the data. Such growing datasets lead to ever-increasing space and performance overheads for maintaining and accessing indexes. Furthermore, there is often considerable skew with popular users and recent data accessed much more frequently. These observations led us to design Shinobi, a system which uses horizontal partitioning as a mechanism for improving query performance to cluster the physical data, and increasing insert performance by only indexing data that is frequently accessed. We present database design algorithms that optimally partition tables, drop indexes from partitions that are infrequently queried, and maintain these partitions as workloads change. We show a 60× performance improvement over traditionally indexed tables using a real-world query workload derived from a traffic monitoring application

[1]  Goetz Graefe,et al.  Write-Optimized B-Trees , 2004, VLDB.

[2]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[3]  Goetz Graefe,et al.  Sorting And Indexing With Partitioned B-Trees , 2003, CIDR.

[4]  Harumi A. Kuno,et al.  Adaptive indexing for relational keys , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[5]  Goetz Graefe Partitioned B-trees - a user's guide , 2003, BTW.

[6]  Stanley B. Zdonik,et al.  CORADD , 2010, Proc. VLDB Endow..

[7]  Karsten Schmidt,et al.  Autonomous Management of Soft Indexes , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[8]  Surajit Chaudhuri,et al.  AutoAdmin “what-if” index analysis utility , 1998, SIGMOD '98.

[9]  S. Madden,et al.  UPI: A Primary Index for Uncertain Databases , 2010, Proc. VLDB Endow..

[10]  Martin L. Kersten,et al.  Self-organizing tuple reconstruction in column-stores , 2009, SIGMOD Conference.

[11]  Ramesh K. Sitaraman,et al.  Lazy-Adaptive Tree: An Optimized Index Structure for Flash Devices , 2009, Proc. VLDB Endow..

[12]  Praveen Seshadri,et al.  Generalized partial indexes , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[13]  Serge Abiteboul,et al.  On-Line Index Selection for Shifting Workloads , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[14]  Vivek R. Narasayya,et al.  Integrating vertical and horizontal partitioning into automated physical database design , 2004, SIGMOD '04.

[15]  Anastasia Ailamaki,et al.  AutoPart: automating schema design for large scientific databases using data partitioning , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[16]  Chris Jermaine,et al.  A Novel Index Supporting High Volume Data Warehouse Insertion , 1999, VLDB.

[17]  Surajit Chaudhuri,et al.  An Online Approach to Physical Design Tuning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[18]  Matthias Jarke,et al.  Query Optimization in Database Systems , 1984, CSUR.

[19]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[20]  Martin L. Kersten,et al.  Cracking the Database Store , 2005, CIDR.

[21]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[22]  Stefano Ceri,et al.  Horizontal data partitioning in database design , 1982, SIGMOD '82.

[23]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[24]  Yang Zhang,et al.  CarTel: a distributed mobile sensor computing system , 2006, SenSys '06.

[25]  Daniel C. Zilio,et al.  DB2 advisor: an optimizer smart enough to recommend its own indexes , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[26]  Samuel Madden,et al.  Insert-aware Partitioning and Indexing Techniques For Skewed Database Workloads , 2010 .

[27]  Samuel Madden,et al.  TrajStore: An adaptive storage system for very large trajectory data sets , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[28]  Surajit Chaudhuri,et al.  An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server , 1997, VLDB.

[29]  Michael Stonebraker,et al.  The case for partial indexes , 1989, SGMD.