Performance engineering case study: heap construction

The behaviour of three methods for constructing a binary heap on a computer with a hierarchical memory is studied. The methods considered are the original one proposed by Williams [1964], in which elements are repeatedly inserted into a single heap; the improvement by Floyd [1964], in which small heaps are repeatedly merged to bigger heaps; and a recent method proposed, e.g., by Fadel et al. [1999] in which a heap is built layerwise. Both the worst-case number of instructions and that of cache misses are analysed. It is well-known that Floyd's method has the best instruction count. Let <i>N</i> denote the size of the heap to be constructed, <i>B</i> the number of elements that fit into a cache line, and let <i>c</i> and <i>d</i> be some positive constants. Our analysis shows that, under reasonable assumptions, repeated insertion and layerwise construction both incur at most <i>cN/B</i> cache misses, whereas repeated merging, as programmed by Floyd, can incur more than (<i>dN</i> log<inf>2</inf> <i>B</i>)/<i>B</i> cache misses. However, for our memory-tuned versions of repeated insertion and repeated merging the number of cache misses incurred is close to the optimal bound <i>N</i>/<i>B</i>. In addition to these theoretical findings, we communicate many practical experiences which we hope to be valuable for others doing experimental algorithmic work.

[1]  Torben Hagerup,et al.  Sorting and Searching on the Word RAM , 1998, STACS.

[2]  John C. Reynolds,et al.  The discoveries of continuations , 1993, LISP Symb. Comput..

[3]  C. A. R. Hoare Algorithm 63: partition , 1961, CACM.

[4]  Alan Jay Smith,et al.  Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes , 1995, IEEE Trans. Computers.

[5]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[6]  Lutz M. Wegner Quicksort for Equal Keys , 1985, IEEE Transactions on Computers.

[7]  Henry D. Shapiro,et al.  Algorithms from P to NP (vol. 1): design and efficiency , 1991 .

[8]  Manuel Blum,et al.  Time Bounds for Selection , 1973, J. Comput. Syst. Sci..

[9]  Brian W. Kernighan,et al.  The C Programming Language , 1978 .

[10]  C. A. R. Hoare,et al.  Algorithm 64: Quicksort , 1961, Commun. ACM.

[11]  Maz Spork,et al.  Design and Analysis of Cache-Conscious Programs , 1999 .

[12]  Bruce A. Reed,et al.  Building Heaps Fast , 1989, J. Algorithms.

[13]  Bjarne Stroustrup,et al.  The C++ programming language (2nd ed.) , 1991 .

[14]  Robert Sedgewick,et al.  Implementing Quicksort programs , 1978, CACM.

[15]  Colin McDiarmid,et al.  Average Case Analysis of Heap Building by Repeated Insertion , 1991, J. Algorithms.

[16]  Richard E. Ladner,et al.  The influence of caches on the performance of heaps , 1996, JEAL.

[17]  Peter Sanders Fast Priority Queues for Cached Memory , 1999, ALENEX.

[18]  Ronald L. Rivest,et al.  Expected time bounds for selection , 1975, Commun. ACM.

[19]  R. Carlsson A Note on Heapsort (Short Note) , 1992, Comput. J..

[20]  Ingo Wegener The Worst Case Complexity of McDiarmid and Reed's Variant of BOTTOM-UP HEAPSORT is less than nlog n + 1.1n , 1992, Inf. Comput..

[21]  Jukka Teuhola,et al.  Heaps and Heapsort on Secondary Storage , 1999, Theor. Comput. Sci..

[22]  Robert E. Tarjan,et al.  Amortized efficiency of list update and paging rules , 1985, CACM.

[23]  C. A. R. Hoare,et al.  Algorithm 65: find , 1961, Commun. ACM.

[24]  Richard E. Ladner,et al.  The influence of caches on the performance of sorting , 1997, SODA '97.

[25]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[26]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[27]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[28]  H. Prodinger,et al.  Analysis of Hoare's FIND algorithm with median-of-three partition , 1997 .

[29]  Jesper Larsson Träff,et al.  A Meticulous Analysis of Mergesort Programs , 1997, CIAC.