In this paper, we describe a compilation system that automates much of the process of performance tuning that is currently done manually by application programmers interested in high performance. Our approach combines compiler models and heuristics with guided empirical search to take advantage of their complementary strengths. The models and heuristics limit the search to a small number of candidate implementations, and the empirical results provide the most accurate information to the compiler to select among candidates and tune optimization parameter values. The overall approach can be employed to alleviate some of the performance problems that lead to inefficiencies in key applications today: register pressure, cache conflict misses, and the trade-off between synchronization, parallelism and locality in SMPs. The main focus of the paper is an algorithm for simultaneously optimizing across multiple levels of the memory hierarchy for dense-matrix computations. We have developed an initial compiler implementation, and present automatically-generated results on matrix multiply. Results on two architectures, SGI R10000 and Sun UltraSparc IIe, outperform the native compiler, and either outperform or achieve comparable performance as the ATLAS self-tuning library and the hand-tuned vendor BLAS library. This paper describes other components of the ECO system, including supporting tools and experiments with programmer-guided performance tuning. This approach has provided a foundation for a general framework for systematic optimization of domain-specific applications. Specifically, we are developing an optimization system for signal and image processing that exploits signal properties, and we are using machine learning and a knowledge-rich representation can be exploited to optimize molecular dynamics simulation
[1]
Yoon-Ju Lee,et al.
A Code Isolator: Isolating Code Fragments from Large Programs
,
2004,
LCPC.
[2]
Chun Chen,et al.
A Systematic Approach to Model-Guided Empirical Search for Memory Hierarchy Optimization
,
2005,
LCPC.
[3]
Yuefan Deng,et al.
New trends in high performance computing
,
2001,
Parallel Computing.
[4]
Chun Chen,et al.
Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy
,
2005,
International Symposium on Code Generation and Optimization.
[5]
James Demmel,et al.
Statistical Modeling of Feedback Data in an Automatic Tuning System
,
2000
.
[6]
Yoon-Ju Lee,et al.
Empirical Optimization for a Sparse Linear Solver: A Case Study
,
2005,
International Journal of Parallel Programming.
[7]
David A. Padua,et al.
SPL: a language and compiler for DSP algorithms
,
2001,
PLDI '01.
[8]
Matteo Frigo,et al.
A fast Fourier transform compiler
,
1999,
SIGP.