An experimental study of sorting and branch prediction

Sorting is one of the most important and well-studied problems in computer science. Many good algorithms are known which offer various trade-offs in efficiency, simplicity, memory use, and other factors. However, these algorithms do not take into account features of modern computer architectures that significantly influence performance. Caches and branch predictors are two such features and, while there has been a significant amount of research into the cache performance of general purpose sorting algorithms, there has been little research on their branch prediction properties. In this paper, we empirically examine the behavior of the branches in all the most common sorting algorithms. We also consider the interaction of cache optimization on the predictability of the branches in these algorithms. We find insertion sort to have the fewest branch mispredictions of any comparison-based sorting algorithm, that bubble and shaker sort operate in a fashion that makes their branches highly unpredictable, that the unpredictability of shellsort's branches improves its caching behavior, and that several cache optimizations have little effect on mergesort's branch mispredictions. We find also that optimizations to quicksort, for example the choice of pivot, have a strong influence on the predictability of its branches. We point out a simple way of removing branch instructions from a classic heapsort implementation and also show that unrolling a loop in a cache-optimized heapsort implementation improves the predicitability of its branches. Finally, we note that when sorting random data two-level adaptive branch predictors are usually no better than simpler bimodal predictors. This is despite the fact that two-level adaptive predictors are almost always superior to bimodal predictors, in general.

[1]  G. H. Gonnet,et al.  Handbook of algorithms and data structures: in Pascal and C (2nd ed.) , 1991 .

[2]  Donald L. Shell,et al.  A high-speed sorting procedure , 1959, CACM.

[3]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[4]  Jukka Teuhola,et al.  Practical In-Place Mergesort , 1996, Nord. J. Comput..

[5]  Donald E. Knuth,et al.  The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition , 1997 .

[6]  Edward H. Friend,et al.  Sorting on Electronic Computer Systems , 1956, JACM.

[7]  Michael Gschwind,et al.  Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture , 2006, IBM Syst. J..

[8]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[9]  Naila Rahman,et al.  Adapting Radix Sort to the Memory Hierarchy , 2001, JEAL.

[10]  Gerth Stølting Brodal,et al.  On the adaptiveness of Quicksort , 2004, JEAL.

[11]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[12]  Ramesh C. Agarwal,et al.  A super scalar sort algorithm for RISC processors , 1996, SIGMOD '96.

[13]  Trevor Mudge,et al.  Limits to Branch Prediction , 2000 .

[14]  Richard E. Ladner,et al.  Caches and algorithms , 1996 .

[15]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[16]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[17]  Robert Sedgewick,et al.  Implementing Quicksort programs , 1978, CACM.

[18]  Gaston H. Gonnet,et al.  Handbook Of Algorithms And Data Structures , 1984 .

[19]  Gerth Stølting Brodal,et al.  Tradeoffs Between Branch Mispredictions and Comparisons for Sorting Algorithms , 2005, WADS.

[20]  Robert Sedgewick,et al.  Algorithms in C , 1990 .

[21]  Gerth Stølting Brodal,et al.  Engineering a cache-oblivious sorting algorithm , 2008, JEAL.

[22]  Richard E. Ladner,et al.  The influence of caches on the performance of heaps , 1996, JEAL.

[23]  Timothy J. Purcell Sorting and searching , 2005, SIGGRAPH Courses.

[24]  Jon Louis Bentley,et al.  Engineering a sort function , 1993, Softw. Pract. Exp..

[25]  Richard E. Ladner,et al.  The influence of caches on the performance of sorting , 1997, SODA '97.

[26]  Li Xiao,et al.  Improving memory performance of sorting algorithms , 2000, JEAL.

[27]  Sebastian Winkel,et al.  Super Scalar Sample Sort , 2004, ESA.

[28]  Jeffrey Scott Vitter,et al.  Efficient sorting using registers and caches , 2000, JEAL.

[29]  David B. Lomet,et al.  AlphaSort: a RISC machine sort , 1994, SIGMOD '94.

[30]  David A. Padua,et al.  Optimizing sorting with genetic algorithms , 2005, International Symposium on Code Generation and Optimization.

[31]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[32]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.

[33]  Augustus K. Uht,et al.  Branch Effect Reduction Techniques , 1997, Computer.

[34]  Jeffrey Scott Vitter,et al.  Efficient Sorting Using Registers and Caches , 2000, Algorithm Engineering.

[35]  Peter Sanders,et al.  How Branch Mispredictions Affect Quicksort , 2006, ESA.

[36]  Donald E. Knuth The art of computer programming: fundamental algorithms , 1969 .

[37]  David Gregg,et al.  Sorting in the Presence of Branch Prediction and Caches Fast Sorting on Modern Computers , 2005 .