论文信息 - Compiling for instruction cache performance on a multithreaded architecture

Compiling for instruction cache performance on a multithreaded architecture

Instruction cache aware compilation seeks to lay out a program in memory in such a way that cache conflicts between procedures are minimized. It does this through profile-driven knowledge of procedure invocation patterns. On a multithreaded architecture, however, more conflicts may arise between threads than between procedures on the same thread. This research examines opportunities for the compiler to optimize instruction cache layout on a multithreaded architecture. We examine scenarios where (1) the compiler has knowledge, about multiple programs that will be or are likely to be co-scheduled, and where (2) the compiler has no knowledge at compile time of which applications will be co-scheduled. We present solutions for both environments.

Dean M. Tullsen | Rakesh Kumar

[1] Erik Brunvand,et al. Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[2] Scott McFarling,et al. Program optimization for instruction caches , 1989, ASPLOS III.

[3] Dean M. Tullsen,et al. Fellowship - Simulation And Modeling Of A Simultaneous Multithreading Processor , 1996, Int. CMG Conference.

[4] Wen-mei W. Hwu,et al. Achieving High Instruction Cache Performance With An Optimizing Compiler , 1989, The 16th Annual International Symposium on Computer Architecture.

[5] J. Bradley Chen,et al. Improving instruction locality with just-in-time code layout , 1997 .

[6] Brad Calder,et al. Efficient procedure mapping using cache line coloring , 1997, PLDI '97.

[7] Brad Calder,et al. Reducing cache misses using hardware and software page placement , 1999, ICS '99.

[8] Alan Eustace,et al. ATOM - A System for Building Customized Program Analysis Tools , 1994, PLDI.

[9] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[10] Jack L. Lo,et al. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[11] David R. Kaeli,et al. Temporal-based procedure reordering for improved instruction cache performance , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[12] Donald Yeung,et al. Sparcle: an evolutionary processor design for large-scale multiprocessors , 1993, IEEE Micro.

[13] Dean M. Tullsen,et al. Handling long-latency loads in a simultaneous multithreading processor , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[14] Josep Torrellas,et al. Optimizing instruction cache performance for operating system intensive workloads , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[15] Michael D. Smith,et al. Procedure placement using temporal-ordering information , 1999, TOPL.

[16] Brian N. Bershad,et al. Avoiding conflict misses dynamically in large direct-mapped caches , 1994, ASPLOS VI.

[17] Todd C. Mowry,et al. Compiler-directed page coloring for multiprocessors , 1996, ASPLOS VII.