Efficient cache use for stencil operations on structured discretization grids

We derive tight bounds on cache misses for evaluation of explicit stencil operators on structured grids. Our lower bound is based on the isoperimetrical property of the discrete octahedron. Our upper bound is based on good surface to volume ratio of a parallelepiped spanned by a reduced basis of the inter- ference lattice of a grid. Measurements show that our algorithm typically reduces the number of cache misses by factor of three relative to a compiler optimized code. We show that stencil calculations on grids whose interference lattice have a short vector feature abnormally high numbers of cache misses. We call such grids unfavorable and suggest to avoid these in computations by appropriate padding. By direct measurements on MIPS R10000 we show a good correlation of abnormally high cache misses and unfavorable three-dimensional grids.