Towards a Systematic, Pragmatic and Architecture-Aware Program Optimization Process for Complex Processors

Because processor architectures are increasingly complex, it is increasingly difficult to embed accurate machine models within compilers. As a result, compiler efficiency tends to decrease. Currently, the trend is on top-down approaches: static compilers are progressively augmented with information from the architecture as in profile-based, iterative or dynamic compilation techniques. However, for the moment, fairly elementary architectural information is used. In this article, we adopt a bottom-up approach to the architecture complexity issue: we assume we know everything about the behavior of the program on the architecture. We present a manual but systematic process for optimizing a program on a complex processor architecture using extensive dynamic analysis, and we find that a small set of run-time information is sufficient to drive an efficient process. We have experimentally observed on an Alpha 21264 that this approach can yield significant performance improvement on Spec benchmarks, beyond peak Spec. We are currently using this approach for optimizing customer applications.

[1]  Michael F. P. O'Boyle,et al.  Evaluating Iterative Compilation , 2002, LCPC.

[2]  Olivier Temam,et al.  A quantitative analysis of loop nest locality , 1996, ASPLOS VII.

[3]  David Parello,et al.  On Increasing Architecture Awareness in Program Optimizations to Bridge the Gap between Peak and Sustained Processor Performance — Matrix-Multiply Revisited , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[4]  Albert Cohen,et al.  Putting Polyhedral Loop Transformations to Work , 2003, LCPC.

[5]  Albert Cohen,et al.  DiST: a simple, reliable and scalable method to significantly reduce processor architecture simulation time , 2003, SIGMETRICS '03.

[6]  Keith D. Cooper,et al.  Adaptive Optimizing Compilers for the 21st Century , 2002, The Journal of Supercomputing.

[7]  Thomas F. Wenisch,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, ISCA '03.

[8]  Brad Calder,et al.  Using SimPoint for accurate and efficient simulation , 2003, SIGMETRICS '03.

[9]  Saman P. Amarasinghe,et al.  Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[10]  Michael Franz,et al.  Continuous program optimization: A case study , 2003, TOPL.

[11]  Gang Ren,et al.  A comparison of empirical and model-driven optimization , 2003, PLDI '03.

[12]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[13]  Michael F. P. O'Boyle,et al.  Feedback Assisted Iterative Compilation , 2000 .

[14]  T. Kisuki,et al.  Iterative Compilation in Program Optimization , 2000 .