Database Servers on Chip Multiprocessors: Limitations and Opportunities

Prior research shows that database system performance is dominated by off-chip data stalls, resulting in a concerted effort to bring data into on-chip caches. At the same time, high levels of integration have enabled the advent of chip multiprocessors and increasingly large (and slow) on-chip caches. These two trends pose the imminent technical and research challenge of adapting high-performance data management software to a shifting hardware landscape. In this paper we characterize the performance of a commercial database server running on emerging chip multiprocessor technologies. We find that the major bottleneck of current software is data cache stalls, with L2 hit stalls rising from oblivion to become the dominant execution time component in some cases. We analyze the source of this shift and derive a list of features for future database designs to attain maximum performance.

[1]  Thomas F. Wenisch,et al.  SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture , 2004, PERV.

[2]  Ann Marie Grizzaffi Maynard,et al.  Contrasting characteristics and cache performance of technical and multi-user commercial workloads , 1994, ASPLOS VI.

[3]  Susan J. Eggers,et al.  An analysis of database workload performance on simultaneous multithreaded processors , 1998, ISCA.

[4]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[5]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[6]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[7]  Anastasia Ailamaki,et al.  Optimistic Intra-Transaction Parallelism on Chip Multiprocessors , 2005, VLDB.

[8]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[9]  Anastasia Ailamaki,et al.  A Case for Staged Database Systems , 2003, CIDR.

[10]  David J. DeWitt,et al.  Weaving Relations for Cache Performance , 2001, VLDB.

[11]  Anastasia Ailamaki,et al.  Improving hash join performance through prefetching , 2004, Proceedings. 20th International Conference on Data Engineering.

[12]  Shekhar Y. Borkar Microarchitecture and Design Challenges for Gigascale Integration , 2004, MICRO.

[13]  Onur Mutlu,et al.  Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[14]  Anastasia Ailamaki,et al.  QPipe: a simultaneously pipelined relational query engine , 2005, SIGMOD '05.

[15]  S. Tam,et al.  A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[16]  Norman P. Jouppi,et al.  CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.

[17]  Jeffrey F. Naughton,et al.  Cache Conscious Algorithms for Relational Query Processing , 1994, VLDB.

[18]  James R. Larus,et al.  Using Cohort-Scheduling to Enhance Server Performance , 2002, USENIX Annual Technical Conference, General Track.

[19]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[20]  Anastasia Ailamaki,et al.  STEPS towards Cache-resident Transaction Processing , 2004, VLDB.

[21]  Luiz André Barroso,et al.  Memory system characterization of commercial workloads , 1998, ISCA.

[22]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[23]  C. Morganti,et al.  The asynchronous 24MB on-chip level-3 cache for a dual-core Itanium/sup /spl reg//-family processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[24]  James R. Larus,et al.  Cache-conscious structure layout , 1999, PLDI '99.

[25]  Thomas F. Wenisch,et al.  Spatial Memory Streaming , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[26]  Kunle Olukotun,et al.  Maximizing CMP throughput with mediocre cores , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[27]  Marcin Zukowski,et al.  MonetDB/X100 - A DBMS In The CPU Cache , 2005, IEEE Data Eng. Bull..

[28]  Balaram Sinharoy,et al.  IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.

[29]  Thomas F. Wenisch,et al.  SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.

[30]  Sarita V. Adve,et al.  Performance of database workloads on shared-memory systems with out-of-order processors , 1998, ASPLOS VIII.

[31]  Babak Falsafi,et al.  DBmbench: fast and accurate database workload representation on modern microarchitecture , 2005, CASCON.