Compiling for instruction cache performance on a multithreaded architecture

Instruction cache aware compilation seeks to lay out a program in memory in such a way that cache conflicts between procedures are minimized. It does this through profile-driven knowledge of procedure invocation patterns. On a multithreaded architecture, however, more conflicts may arise between threads than between procedures on the same thread. This research examines opportunities for the compiler to optimize instruction cache layout on a multithreaded architecture. We examine scenarios where (1) the compiler has knowledge, about multiple programs that will be or are likely to be co-scheduled, and where (2) the compiler has no knowledge at compile time of which applications will be co-scheduled. We present solutions for both environments.

[1]  Erik Brunvand,et al.  Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[2]  Scott McFarling,et al.  Program optimization for instruction caches , 1989, ASPLOS III.

[3]  Dean M. Tullsen,et al.  Fellowship - Simulation And Modeling Of A Simultaneous Multithreading Processor , 1996, Int. CMG Conference.

[4]  Wen-mei W. Hwu,et al.  Achieving High Instruction Cache Performance With An Optimizing Compiler , 1989, The 16th Annual International Symposium on Computer Architecture.

[5]  J. Bradley Chen,et al.  Improving instruction locality with just-in-time code layout , 1997 .

[6]  Brad Calder,et al.  Efficient procedure mapping using cache line coloring , 1997, PLDI '97.

[7]  Brad Calder,et al.  Reducing cache misses using hardware and software page placement , 1999, ICS '99.

[8]  Alan Eustace,et al.  ATOM - A System for Building Customized Program Analysis Tools , 1994, PLDI.

[9]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[10]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[11]  David R. Kaeli,et al.  Temporal-based procedure reordering for improved instruction cache performance , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[12]  Donald Yeung,et al.  Sparcle: an evolutionary processor design for large-scale multiprocessors , 1993, IEEE Micro.

[13]  Dean M. Tullsen,et al.  Handling long-latency loads in a simultaneous multithreading processor , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[14]  Josep Torrellas,et al.  Optimizing instruction cache performance for operating system intensive workloads , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[15]  Michael D. Smith,et al.  Procedure placement using temporal-ordering information , 1999, TOPL.

[16]  Brian N. Bershad,et al.  Avoiding conflict misses dynamically in large direct-mapped caches , 1994, ASPLOS VI.

[17]  Todd C. Mowry,et al.  Compiler-directed page coloring for multiprocessors , 1996, ASPLOS VII.