Effect of node size on the performance of cache-conscious B+-trees

In main-memory databases, the number of processor cache misses has a critical impact on the performance of the system. Cache-conscious indices are designed to improve performance by reducing the number of processor cache misses that are incurred during a search operation. Conventional wisdom suggests that the index's node size should be equal to the cache line size in order to minimize the number of cache misses and improve performance. As we show in this paper, this design choice ignores additional effects, such as the number of instructions executed and the number of TLB misses, which play a significant role in determining the overall performance. To capture the impact of node size on the performance of a cache-conscious B+ tree (CSB+-tree), we first develop an analytical model based on the fundamental components of the search process. This model is then validated with an actual implementation, demonstrating that the model is accurate. Both the analytical model and experiments confirm that using node sizes much larger than the cache line size can result in better search performance for the CSB+-tree.

[1]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[2]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[3]  Alfonso F. Cardenas Analysis and performance of inverted data base structures , 1975, CACM.

[4]  S. B. Yao,et al.  Approximating block accesses in database organizations , 1977, CACM.

[5]  David J. DeWitt,et al.  Benchmarking Database Systems A Systematic Approach , 1983, VLDB.

[6]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[7]  Michael J. Carey,et al.  A Study of Index Structures for a Main Memory Database Management System , 1986, HPTS.

[8]  Leonard D. Shapiro,et al.  Join processing in database systems with large main memories , 1986, TODS.

[9]  Mary K. Vernon,et al.  Performance analysis of multiprocessor cache consistency protocols using generalized timed Petri nets , 1986, SIGMETRICS '86/PERFORMANCE '86.

[10]  Mary K. Vernon,et al.  Performance Analysis of Hierarchical Cache-Consistent Multiprocessors , 1989, Perform. Evaluation.

[11]  Hansjörg Zeller,et al.  An Adaptive Hash Join Algorithm for Multiuser Environments , 1990, VLDB.

[12]  David J. DeWitt,et al.  The Wisconsin Benchmark: Past, Present, and Future , 1991, The Benchmark Handbook.

[13]  Mary K. Vernon,et al.  Comparison of hardware and software cache coherence schemes , 1991, ISCA '91.

[14]  Jeffrey F. Naughton,et al.  Cache Conscious Algorithms for Relational Query Processing , 1994, VLDB.

[15]  Carolyn Turbyfill,et al.  A retrospective on the Wisconsin Benchmark , 1994 .

[16]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[17]  Sharad Malik,et al.  Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.

[18]  Goetz Graefe,et al.  The Five Minute Rule, Ten Years Later , 1997 .

[19]  Goetz Graefe,et al.  The five-minute rule ten years later, and other computer storage rules of thumb , 1997, SGMD.

[20]  Michael Stonebraker,et al.  The Asilomar report on database research , 1998, SGMD.

[21]  Goetz Graefe,et al.  Hash Joins and Hash Teams in Microsoft SQL Server , 1998, VLDB.

[22]  David B. Lomet,et al.  B-tree page size when caching is considered , 1998, SGMD.

[23]  Susan J. Eggers,et al.  An analysis of database workload performance on simultaneous multithreaded processors , 1998, ISCA.

[24]  Martin L. Kersten,et al.  Database Architecture Optimized for the New Bottleneck: Memory Access , 1999, VLDB.

[25]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[26]  Kenneth A. Ross,et al.  Cache Conscious Indexing for Decision-Support in Main Memory , 1999, VLDB.

[27]  James R. Larus,et al.  Cache-conscious structure layout , 1999, PLDI '99.

[28]  Emilio L. Zapata,et al.  Direct mapped cache performance modeling for sparse matrix operations , 1999, Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99.

[29]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[30]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[31]  Kihong Kim,et al.  Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-Memory Multiprocessor Systems , 2001, VLDB.

[32]  Todd C. Mowry,et al.  Improving index performance through prefetching , 2001, SIGMOD '01.

[33]  Rajeev Rastogi,et al.  Main-memory index structures with fixed-size partial keys , 2001, SIGMOD '01.

[34]  Kihong Kim,et al.  Optimizing multidimensional index trees for main memory access , 2001, SIGMOD '01.

[35]  J. Patel,et al.  Effect of Node Size on the Performance of Cache-Conscious Indices , 2002 .

[36]  Gary Valentin,et al.  Fractal prefetching B+-Trees: optimizing both cache and disk performance , 2002, SIGMOD '02.

[37]  Andrew Chi-Chih Yao,et al.  On random 2–3 trees , 1978, Acta Informatica.

[38]  David B. Lomet,et al.  Alphasort: A cache-sensitive parallel external sort , 1995, The VLDB Journal.

[39]  David J. DeWitt,et al.  An evaluation of buffer management strategies for relational database systems , 1986, Algorithmica.