Compiler algorithms for efficient use of memory systems

This thesis investigates compiler algorithms to transform program and data to utilize efficiently the underlying memory systems. Despite extensive studies for locality enhancement for perfectly-nested loops, little work has been done for imperfectly-nested loops. In this thesis, two such techniques are presented. The first technique is to the imperfectly-nested loops so that the utilization of cache memories and the translation lookaside buffer (TLB) is enhanced. We develop a memory cost model to characterize the cache reuse and an execution cost model to estimate the execution time. Array duplication, which helps remove false dependences, is applied whenever beneficial. Speculative execution is used to overcome premature exits for certain applications. By tiling the outer loop, which encloses several perfectly-nested loops, the locality across different inner loops as well as the outer loop itself is exploited. The second technique is to contract the temporary storage used in computation without changing the program's semantics. Enabled by loop shifting and loop fusion, the memory reduction technique can enhance locality because of two factors, namely the reduced reference window size after fusion and the reduced cache pressure after array contraction. We formulate the memory reduction problem as a graph-based problem. Transformed to a network flow problem, it is polynomial-time solvable. Both techniques are implemented in a research compiler, Panaroma. The experimental results demonstrate how effective our techniques can be both in boosting cache utilization and in performance improvement.