Efficient sorting using registers and caches

Modern computer systems have increasingly complex memory systems. Common machine models for algorithm analysis do not reflect many of the features of these systems, e.g., large register sets, lockup-free caches, cache hierarchies, associativity, cache line fetching, and streaming behavior. Inadequate models lead to poor algorithmic choices and an incomplete understanding of algorithm behavior on real machines.A key step toward developing better models is to quantify the performance effects of features not reflected in the models. This paper explores the effect of memory system features on sorting performance. We introduce a new cache-conscious sorting algorithm, R-MERGE, which achieves better performance in practice over algorithms that are superior in the theoretical models. R-MERGE is designed to minimize memory stall cycles rather than cache misses by considering features common to many system designs.

[1]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[2]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..

[3]  Ruben W. Castelino,et al.  Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad-issue CMOS RISC Microprocessor , 1995, Digit. Tech. J..

[4]  Jeffrey Scott Vitter,et al.  Efficient Sorting Using Registers and Caches , 2000, Algorithm Engineering.

[5]  Michael E. Wolf,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[6]  Richard E. Ladner,et al.  The influence of caches on the performance of sorting , 1997, SODA '97.

[7]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.

[8]  Sandeep Sen,et al.  Towards a theory of cache-efficient algorithms , 2000, SODA '00.

[9]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, TOCS.

[10]  Peter Sanders Fast Priority Queues for Cached Memory , 1999, ALENEX.

[11]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[12]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[13]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[14]  Ken Kennedy,et al.  Improving register allocation for subscripted variables , 1990, SIGP.

[15]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[16]  Alok Aggarwal,et al.  Hierarchical memory with block transfer , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[17]  Naila Rahman,et al.  Analysing Cache Effects in Distribution Sorting , 1999, Algorithm Engineering.

[18]  Bowen Alpern,et al.  A model for hierarchical memory , 1987, STOC.

[19]  Peter Sanders,et al.  Accessing Multiple Sequences Through Set Associative Caches , 1999, ICALP.

[20]  Sonal Kothari,et al.  Register Efficient Mergesorting , 2000, HiPC.

[21]  Richard E. Ladner,et al.  Cache performance analysis of traversals and random accesses , 1999, SODA '99.

[22]  Naila Rahman,et al.  Adapting Radix Sort to the Memory Hierarchy , 2001, JEAL.