Compiler Techniques for Reducing Data Cache Miss Rate on a Multithreaded Architecture

High performance embedded architectures will in some cases combine simple caches and multithreading, two techniques that increase energy efficiency and performance at the same time. However, that combination can produce high and unpredictable cache miss rates, even when the compiler optimizes the data layout of each program for the cache. This paper examines data-cache aware compilation for multithreaded architectures. Data-cache aware compilation finds a layout for data objects which minimizes inter-object conflict misses. This research extends and adapts prior cache-conscious data layout optimizations to the much more difficult environment of multithreaded architectures. Solutions are presented for two computing scenarios: (1) the more general case where any application can be scheduled along with other applications, and (2) the case where the co-scheduled working set is more precisely known.

[1]  David H. Albonesi,et al.  Dynamic Capacity-Speed Tradeoffs in SMT Processor Caches , 2007, HiPEAC.

[2]  Kevin Skadron,et al.  Understanding the energy efficiency of simultaneous multithreading , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[3]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[4]  William L. Lynch,et al.  The Effect Of Page Allocation On Caches , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.

[5]  Dean M. Tullsen,et al.  Handling long-latency loads in a simultaneous multithreading processor , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[6]  Chandra Krintz,et al.  Cache-conscious data placement , 1998, ASPLOS VIII.

[7]  Sebastien Hily,et al.  Standard Memory Hierarchy Does Not Fit Simultaneous Multithreading , 1998 .

[8]  Norman P. Jouppi,et al.  Conjoined-Core Chip Multiprocessing , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[9]  Dimitrios S. Nikolopoulos Code and Data Transformations for Improving Shared Cache Performance on SMT Processors , 2003, ISHPC.

[10]  François Bodin,et al.  Skewed-associative Caches , 1993, PARLE.

[11]  Dean M. Tullsen,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[12]  Dean M. Tullsen,et al.  Compiling for instruction cache performance on a multithreaded architecture , 2002, MICRO.

[13]  Dolors Royo,et al.  Dynamic Cache Splitting , 1995 .

[14]  Chau-Wen Tseng,et al.  Data transformations for eliminating conflict misses , 1998, PLDI.

[15]  André Seznec,et al.  CASH: Revisiting Hardware Sharing in Single-Chip Parallel Processors , 2004, J. Instr. Level Parallelism.

[16]  Alan Eustace,et al.  ATOM - A System for Building Customized Program Analysis Tools , 1994, PLDI.

[17]  Mario Nemirovsky,et al.  Quantitative study of data caches on a multistreamed architecture , 1999 .

[18]  Antonio González,et al.  Data Caches for Multithreaded Processors , 2000 .

[19]  Dean M. Tullsen,et al.  Fellowship - Simulation And Modeling Of A Simultaneous Multithreading Processor , 1996, Int. CMG Conference.

[20]  Anant Agarwal,et al.  Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.

[21]  Antonio González,et al.  Randomized Cache Placement for Eliminating Conflicts , 1999, IEEE Trans. Computers.

[22]  Dean M. Tullsen,et al.  Power-sensitive multithreaded architecture , 2000, Proceedings 2000 International Conference on Computer Design.