Sorting in the Presence of Branch Prediction and Caches Fast Sorting on Modern Computers

Sorting is one of the most important and studied problems in computer science. Many good algorithms exist which offer various trade-offs in efficiency, simplicity and memory use. However most of these algorithms were discovered decades ago at a time when computer architectures were much simpler than today. Branch prediction and cache memories are two developments in computer architecture that have a particularly large impact on the performance of sorting algorithms. This report describes a study of the behaviour of sorting algorithms on branch predictors and caches. Our work on branch prediction is almost entirely new, and finds a number of important results. In particular we show that insertion sort causes the fewest branch mispredictions of any comparison-based algorithm, that optimizations such as the choice of the pivot in quicksort can have a large impact on the predictability of branches, and that advanced two-level branch predictors are usually worse at predicting branches in sorting algorithms than simpler branch predictors. In many cases it is possible to draw links between classical theoretical analyses of algorithms and their branch prediction behaviour. The other main work described in this report is an analysis of the behaviour of sorting algorithms on modern caches. Over the last decade there has been considerable interest in optimizing sorting algorithms to reduce the number of cache misses. We experimentally study the cache performance of both classical sorting algorithms, and a variety of cache-optimized algorithms proposed by LaMarca and Ladner. Our experiments cover a much wider range of algorithms than other work, including the O(N ) sorts, radixsort and shellsort, all within a single framework. We discover a number of new results, particularly relating to the branch prediction behaviour of cache-optimized sorts. We also developed a number of other improvements to the algorithms, such as removing the need for a sentinel in classical heapsort. Overall, we found that a cache-optimized radixsort was the fastest sort in our study; the absence of comparison branches means that the algorithm causes almost no branch mispredictions.

[1]  James E. Smith,et al.  A study of branch prediction strategies , 1981, ISCA '98.

[2]  C. Q. Lee,et al.  The Computer Journal , 1958, Nature.

[3]  Augustus K. Uht,et al.  Branch Effect Reduction Techniques , 1997, Computer.

[4]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[5]  Richard E. Ladner,et al.  The influence of caches on the performance of heaps , 1996, JEAL.

[6]  C. A. R. Hoare Algorithm 63: partition , 1961, CACM.

[7]  Robert Sedgewick,et al.  Analysis of Shellsort and Related Algorithms , 1996, ESA.

[8]  C. A. R. Hoare,et al.  Algorithm 64: Quicksort , 1961, Commun. ACM.

[9]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[10]  Richard E. Ladner,et al.  Caches and algorithms , 1996 .

[11]  BurgerDoug,et al.  The SimpleScalar tool set, version 2.0 , 1997 .

[12]  Michael Wolfe,et al.  More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[13]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[14]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[15]  Donald E. Knuth,et al.  The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition , 1997 .

[16]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[17]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[18]  Jon Louis Bentley,et al.  Engineering a sort function , 1993, Softw. Pract. Exp..

[19]  Li Xiao,et al.  Improving memory performance of sorting algorithms , 2000, JEAL.

[20]  Richard E. Ladner,et al.  The influence of caches on the performance of sorting , 1997, SODA '97.

[21]  Robert Sedgewick,et al.  Implementing Quicksort programs , 1978, CACM.

[22]  Robert Sedgewick,et al.  Algorithms in C , 1990 .

[23]  Edward H. Friend,et al.  Sorting on Electronic Computer Systems , 1956, JACM.

[24]  Donald L. Shell,et al.  A high-speed sorting procedure , 1959, CACM.

[25]  Woody Lichtenstein,et al.  The multiflow trace scheduling compiler , 1993, The Journal of Supercomputing.

[26]  S. McFarling Combining Branch Predictors , 1993 .