The Changing Relevance of the TLB

A little over a decade ago, Goto and van de Geijn wrote about the importance of the treatment of the translation lookaside buffer (TLB) on the performance of matrix multiplication. Crucially, they did not say how important, nor did they provide results that would allow the reader to make his own judgement. In this paper, we revisit their work and look at the effect on the performance of their algorithm when built with different assumed data TLB sizes. Results on three different processors, one relatively modern, two contemporary with Goto and van de Geijn's writings, are examined and compared within a real-world context. Our findings show that, although important when aiming for a place in the TOP500 list, these features have little practical effect, at least on the architectures we have chosen. We conclude, then, that the importance of the various factors, which must be taken into account when tuning matrix multiplication (GEMM, the heart of the High Performance LINPACK benchmark, and hence of the TOP500 table), differ dramatically relative to one another on different processors.