Master-detail clustering using merged indexes

AbstractMerged indexes are B-trees that contain multiple traditional indexes and interleave their records based on a common sort order. In relational databases, merged indexes implement ‘‘master-detail clustering’’ of related records, e.g., orders and order details. Thus, merged indexes shift de-normalization from the logical level of tables and rows to the physical level of indexes and records, which is a much more appropriate place for it. For object-oriented applications, clustering can reduce the I/O cost for joining rows in related tables to a fraction compared to traditional indexes, with additional beneficial effects on buffer pool requirements. Prior research has covered merged indexes without providing much guidance for their implementation. Enabling the design proposed here is a strict separation of B-tree and index into two layers of abstraction. In addition, this paper provides algorithms (i) for data layout including bitmap indexes and column stores, (ii) for concurrency control and recovery including locking of individual keys and of complex objects, (iii) for update operations including bulk insertions, bulk deletions, and deferred index maintenance, (iv) for adding and removing individual indexes within a merged index, (v) for enforcement of relational integrity constraints from uniqueness constraints to foreign key constraints, and (vi) for query processing including caching in query execution plans dominated by nested iteration and index navigation. In the proposed design for merged indexes, the set of tables, views, and indexes can evolve without restriction. The set of clustering columns can also evolve freely. A relational query processor can search and update index records just as in traditional indexes. With these abilities, the proposed design may finally bring general master-detail clustering to traditional databases together with its advantages in performance and cost. ZusammenfassungMerged Indexes (eine Art physischer Sammelindex) sind spezielle B-Bäume, die mehrere herkömmliche Indexe vereinigen und ihre Einträge in einer gemeinsamen Sortierreihenfolge speichern. In relationalen Datenbanken implementieren Merged Indexes eine ,,hierarchische Clusterbildung‘‘ (master-detail clustering) zusammengehöriger Sätze, beispielsweise von Bestellungen und Bestellposten. Damit verlagern Merged Indexes die Denormalisierung von der logischen Ebene der Tabellen und Zeilen auf die physische Ebene der Indexe und Sätze, was eine viel besser geeignete Stelle dafür ist. Im Vergleich zum Einsatz herkömmlicher Indexe können bei objektorientierten Anwendungen durch Clusterbildung die E/A-Kosten für den Verbund in Beziehung stehender Tabellen auf einen Bruchteil gesenkt werden, wobei zusätzlich Spareffekte bei der Puffernutzung erzielt werden. In früheren Publikationen werden Merged Indexes ohne spezielle Hinweise zu ihrer Implementierung erwähnt. Unsere Vorgehensweise bei ihrer Realisierung sieht eine strikte Trennung von B-Baum und Index in zwei Abstraktionsebenen vor. Zusätzlich liefert unser Beitrag Algorithmen (i) für die Datenabbildung einschließlich der Bitlisten-Indexe und der spaltenorientierten Speicherung, (ii) für Mehrbenutzersynchronisation und Recovery einschließlich spezieller Sperren für einzelne Schlüsselwerte und komplexe Objekte, (iii) für Aktualisierungsoperationen einschließlich Einfügungen und Löschungen großer Datenmengen, (iv) für das Hinzufügen und Entfernen einzelner Indexe innerhalb eines Merged Index, (v) zur Kontrolle relationaler Integritätsbedingungen wie Unique- und Fremdschlüsselbedingungen und (vi) für die Anfrageverarbeitung einschließlich Caching-Vorkehrungen bei Anfrageausführungsplänen mit hohem Anteil an geschachtelter Iteration und Index-Navigation. In unserem Entwurf für Merged Indexes kann die Menge der beteiligten Tabellen, Sichten und Indexe unbeschränkt erweitert werden. Auch die Menge der Attribute (Spalten) mit Clusterbildung unterliegt keinen Beschränkungen. Ein relationaler Anfrageprozessor kann Indexeinträge genauso wie in herkömmlichen Indexen suchen und modifizieren. Diese Eigenschaften unseres Entwurfs ermöglichen es schließlich, das Konzept der allgemeinen hierarchischen Clusterbildung in herkömmliche Datenbanken zu integrieren und dabei ihre Vorteile hinsichtlich Leistung und Kosten zu erhalten.

[1]  Goetz Graefe Algorithms for merged indexes , 2007, BTW.

[2]  Stefano Stefani,et al.  Query processing for SQL updates , 2004, SIGMOD '04.

[3]  Goetz Graefe,et al.  Microsoft SQL Server (Chapter 27) , 2001, Database System Concepts, 4th Edition..

[4]  Andreas Reuter,et al.  Principles of transaction-oriented database recovery , 1983, CSUR.

[5]  David B. Lomet,et al.  Key Range Locking Strategies for Improved Concurrency , 1993, VLDB.

[6]  Paul R. Mcjones The 1995 SQL Reunion: People, Project, and Politics, May 29, 1995 , 1997, Digital System Research Center Report.

[7]  Goetz Graefe,et al.  Multi-table joins through bitmapped join indices , 1995, SGMD.

[8]  James R. Hamilton,et al.  An Architecture for Modular Data Centers , 2006, CIDR.

[9]  Patrick E. O'Neil,et al.  The Escrow transactional method , 1986, TODS.

[10]  C. Mohan,et al.  ARIES/KVL: A Key-Value Locking Method for Concurrency Control of Multiaction Transactions Operating on B-Tree Indexes , 1990, VLDB.

[11]  LometDavid The evolution of effective B-tree , 2001 .

[12]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[13]  Goetz Graefe,et al.  Sorting And Indexing With Partitioned B-Trees , 2003, CIDR.

[14]  Theo Härder Implementing a generalized access path structure for a relational database system , 1978, TODS.

[15]  Michael J. Carey,et al.  A Concurrency Control Algorithm for Memory-Resident Database Systems , 1989, FODO.

[16]  Goetz Graefe,et al.  The five-minute rule ten years later, and other computer storage rules of thumb , 1997, SGMD.

[17]  Goetz Graefe,et al.  Hash Joins and Hash Teams in Microsoft SQL Server , 1998, VLDB.

[18]  David B. Lomet The evolution of effective B-tree: page organization and techniques: a personal account , 2001, SGMD.

[19]  Goetz Graefe,et al.  Hierarchical locking in B-tree indexes , 2007, BTW.

[20]  Rudolf Bayer,et al.  Prefix B-trees , 1977, TODS.

[21]  Irving L. Traiger,et al.  Granularity of locks in a shared data base , 1975, VLDB '75.

[22]  Henry F. Korth,et al.  Locking Primitives in a Database System , 1983, JACM.

[23]  Guy M. Lohman,et al.  Differential files: their application to the maintenance of large databases , 1976, TODS.

[24]  Irving L. Traiger,et al.  Granularity of Locks in a Large Shared Data Base. , 1975, VLDB 1975.

[25]  Goetz Graefe,et al.  Data compression and database performance , 1991, [Proceedings] 1991 Symposium on Applied Computing.

[26]  C. Mohan,et al.  ARIES/IM: an efficient and high concurrency index management method using write-ahead logging , 1992, SIGMOD '92.

[27]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[28]  Daniel J. Abadi,et al.  Performance tradeoffs in read-optimized databases , 2006, VLDB.

[29]  Mark R. Tuttle,et al.  Redo Recovery after System Crashes , 1995, VLDB.

[30]  Rohit Jain,et al.  Efficient Search of Multi-Dimensional B-Trees , 1995, VLDB.

[31]  Goetz Graefe B-tree indexes, interpolation search, and skew , 2006, DaMoN '06.

[32]  Patrick Valduriez,et al.  Join indices , 1987, TODS.

[33]  T. H. Merrett,et al.  A class of data structures for associative searching , 1984, PODS.

[34]  Volker Markl,et al.  Integrating the UB-Tree into a Database System Kernel , 2000, VLDB.

[35]  Marvin H. Solomon,et al.  The GMAP: a versatile tool for physical data independence , 1996, The VLDB Journal.

[36]  Alan Jay Smith,et al.  The performance impact of I/O optimizations and disk improvements , 2004, IBM J. Res. Dev..

[37]  Phillip M. Fernandez Red brick warehouse: a read-mostly RDBMS for open SMP platforms , 1994, SIGMOD '94.

[38]  Daniel J. Abadi,et al.  Column Stores for Wide and Sparse Data , 2007, CIDR.

[39]  Ashok M. Joshi,et al.  Adaptive Locking Strategies in a Multi-node Data Sharing Environment , 1991, VLDB.

[40]  Goetz Graefe,et al.  Transaction support for indexed views. , 2004, SIGMOD 2004.

[41]  Meikel Pöss,et al.  Data Compression in Oracle , 2003, VLDB.

[42]  Tobin J. Lehman,et al.  Locking and Latching in a Memory-Resident Database System , 1992, VLDB.

[43]  Arie Shoshani,et al.  Optimizing bitmap indices with efficient compression , 2006, TODS.

[44]  Theo Härder,et al.  Access path support for referential integrity in SQL2 , 1996, The VLDB Journal.

[45]  C. Mohan,et al.  Algorithms for creating indexes for very large tables without quiescing updates , 1992, SIGMOD '92.

[46]  Goetz Graefe,et al.  Executing Nested Queries , 2003, BTW.

[47]  Guy M. Lohman,et al.  Query Optimization in the IBM DB2 Family. , 1993 .

[48]  Goetz Graefe,et al.  B-tree indexes and CPU caches , 2001, Proceedings 17th International Conference on Data Engineering.