Optimizing Cache Access: A Tool for Source-to-Source Transformations and Real-Life Compiler Tests

Loop transformations are well known to be a very useful tool for performance improvements by optimizing cache access. Nevertheless, the automatic application is a complex and challenging task especially for parallel codes. Since the end of the 1980’s it has been promised by most compiler vendors that these features will be implemented – in the next release. We tested current FORTRAN90 compilers (on IBM, Intel and SGI hardware) for their capabilities in this field. This paper shows the results of our analysis. Motivated by this experience we have developed the optimization environment Goofi to assist programmers in applying loop transformations to their code thus gaining better performance for parallel codes even today.

[1]  Bob Francis,et al.  Silicon Graphics Inc. , 1993 .

[2]  Wolfgang E. Nagel,et al.  Performance Optimization for Large Scale Computing: The Scalable VAMPIR Approach , 2001, International Conference on Computational Science.

[3]  J. Tao,et al.  A proposal for a new hardware cache monitoring architecture , 2002, MSP '02.

[4]  Allen D. Malony,et al.  A distributed performance analysis architecture for clusters , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[5]  Rizos Sakellariou,et al.  Euro-Par 2001 Parallel Processing , 2001, Lecture Notes in Computer Science.

[6]  Wolfgang E. Nagel,et al.  Group-Based Performance Analysis for Multithreaded SMP Cluster Applications , 2001, Euro-Par.

[7]  Jack Dongarra,et al.  Computational Science — ICCS 2001 , 2001, Lecture Notes in Computer Science.

[8]  Martin Schulz,et al.  Using Simulation to Understand the Data Layout of Programs , 2001 .