Multithreaded architectures and the sort benchmark

New computer architectures present many challenges to database-system designers. As main memory has increased in size and its latency has increased (in terms of cycles), much research has been focused on improving databasesystem performance to optimize for these new bottlenecks [18, 17, 3, 1, 4, 13, 8]. In this paper, we consider how algorithms designed specifically for newer architectures (featuring simultaneous multithreading (SMT)[20, 14], symmetric multiprocessors (SMP), advanced memory units, chip multiprocessors (CMP), etc.) can help database systems address the cache/memory performance gap. We chose a simple, yet important problem: the sort benchmark. Because the traditional sort benchmark [2] is based on a disk sort, we propose a variant of that problem for inmemory sorting in Section 3. As our benchmark platform, we chose the Intel NetBurst[9] architecture as implemented in the Xeon and Pentium 4 processors. Our results show the following:

[1]  John Paul Shen,et al.  Speculative Precomputation : Exploring the Use of Multithreading for Latency 1 Speculative Precomputation : Exploring the Use of Multithreading for Latency , 2002 .

[2]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[3]  D. Marr,et al.  Hyper-Threading Technology Architecture and MIcroarchitecture , 2002 .

[4]  Peter J. Varman,et al.  Percentile Finding Algorithm for Multiple Sorted Runs , 1989, VLDB.

[5]  Anastasia Ailamaki,et al.  Improving hash join performance through prefetching , 2004, Proceedings. 20th International Conference on Data Engineering.

[6]  Jack Dongarra,et al.  Using PAPI for Hardware Performance Monitoring on Linux Systems , 2001 .

[7]  Dean M. Tullsen,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[8]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[9]  Dean M. Tullsen,et al.  Initial observations of the simultaneous multithreading Pentium 4 processor , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[10]  Dean M. Tullsen,et al.  Simultaneous multithreading: a platform for next-generation processors , 1997, IEEE Micro.

[11]  Jeffrey F. Naughton,et al.  Cache Conscious Algorithms for Relational Query Processing , 1994, VLDB.

[12]  Ali R. Hurson,et al.  Effects of Multithreading on Cache Performance , 1999, IEEE Trans. Computers.

[13]  Jim Gray,et al.  A Minute with Nsort on a 32P NEC Windows Itanium2 Server , 2004 .

[14]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[15]  Michael Stonebraker,et al.  A measure of transaction processing power , 1985 .

[16]  David B. Lomet,et al.  AlphaSort: a RISC machine sort , 1994, SIGMOD '94.

[17]  Goetz Graefe,et al.  B-tree indexes and CPU caches , 2001, Proceedings 17th International Conference on Data Engineering.

[18]  Martin L. Kersten,et al.  Optimizing Main-Memory Join on Modern Hardware , 2002, IEEE Trans. Knowl. Data Eng..

[19]  Rajeev Rastogi,et al.  Main-memory index structures with fixed-size partial keys , 2001, SIGMOD '01.