An experimental evaluation of tiling and shackling for memory hierarchy management

On modern computers the performance of programs is of ten limited by memory latency rather than by processor cycle time To reduce the impact of memory latency the restructuring compiler community has developed locality enhancing program transformations the most well known of which is loop tiling Tiling is restricted to perfectly nested loops but many imperfectly nested loops can be trans formed into perfectly nested loops that can then be tiled Recently we proposed an alternative approach to locality enhancement called data shackling Data shackling reasons about data traversals rather than iteration space traversals and can be applied directly to imperfectly nested loops We have implemented shackling in the SGI MIPSPro compiler which already has a sophisticated implementation of tiling Our experiments on the SGI Octane workstation with dense numerical linear algebra programs show that shackled code obtains double the performance of tiled code for most of these programs and obtains ve times the performance of tiled code for some versions of Cholesky factorization Data shackling has been integrated into the SGI MIPSPro com piler product line