Sequential in-core sorting performance for a SQL data service and for parallel sorting on heterogeneous clusters

Abstract The aim of the paper is to introduce techniques in order to tune sequential in-core sorting algorithms in the frameworks of two applications. The first application is parallel sorting when the processor speeds are not identical in the parallel system. The second application is the Zeta-Data Project [M. Koskas, A hierarchical database management algorithm, in: Annales 67 du Lamsade, vol. 2, 2004, pp. 277–317.  [9] ] whose aim is to develop novel algorithms for databases issues. About 50% of the work done in building indexes is devoted to sorting sets of integers. We develop and compare algorithms built to sort with equal keys. Algorithms are variations of the 3Way-Quicksort of Sedgewick. In order to observe performances and to fully exploit functional units in processors, and also in order to optimize the use of the memory system and the different functional units, we use hardware performance counters that are available on most modern microprocessors. We also develop analytical results for one of our algorithms and compare expected results with the measures. For the two applications, we show, through fine experiments on an Athlon processor (a three-way superscalar x86 processor), that L1 data cache misses are not the central problem, but a subtle proportion of independent retired instructions should be advised to get performance for in-core sorting.

[1]  B. Shriver,et al.  The Anatomy of a High Performance Microprocessor (Interactive Book/CD-ROM): A Systems Perspective with Cdrom , 1998 .

[2]  Jean-Luc Gaudiot,et al.  Parallel Sorting Algorithms with Sampling Techniques on Clusters with Processors Running at Different Speeds , 2000, HiPC.

[3]  S. Lakshmivarahan,et al.  Parallel Sorting Algorithms , 1984, Adv. Comput..

[4]  Sonal Kothari,et al.  Register Efficient Mergesorting , 2000, HiPC.

[5]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[6]  Sandeep Sen,et al.  Towards a theory of cache-efficient algorithms , 2000, SODA '00.

[7]  David B. Lomet,et al.  AlphaSort: a RISC machine sort , 1994, SIGMOD '94.

[8]  Rajeev Raman,et al.  Sorting in linear time? , 1995, STOC '95.

[9]  Josep-Lluís Larriba-Pey,et al.  An analysis of superscalar sorting algorithms on an R8000 processor , 1997, Proceedings 17th International Conference of the Chilean Computer Science Society.

[10]  Bruce D. Shriver,et al.  The anatomy of a high-performance microprocessor - a systems perspective , 1998 .

[11]  Stefan Nilsson The fastest sorting algorithm , 2000 .

[12]  R. Shackleton A Quantitative Approach , 2005 .

[13]  Robert Sedgewick,et al.  Fast algorithms for sorting and searching strings , 1997, SODA '97.

[14]  Jean-Luc Gaudiot,et al.  On a scheme for parallel sorting on heterogeneous clusters , 2002, Future Gener. Comput. Syst..

[15]  Jean-Luc Gaudiot,et al.  An Over-partitioning Scheme for Parallel Sorting on Clusters with Processors Running at different Speeds , 2000, CLUSTER.

[16]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[17]  Michel Koskas a Hierarchical Database Manager , 2004 .

[18]  ardie Jules Verne Evaluation of Two BSP Libraries through Parallel Sorting on Clusters , 2000 .

[19]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[20]  Naila Rahman,et al.  Adapting Radix Sort to the Memory Hierarchy , 2001, JEAL.

[21]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[22]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[23]  Ramesh C. Agarwal,et al.  A super scalar sort algorithm for RISC processors , 1996, SIGMOD '96.

[24]  Robert Sedgewick,et al.  The analysis of Quicksort programs , 1977, Acta Informatica.

[25]  Jeffrey Scott Vitter,et al.  Efficient sorting using registers and caches , 2000, JEAL.

[26]  Richard E. Ladner,et al.  The influence of caches on the performance of sorting , 1997, SODA '97.