An effective cache overlapping storage structure for SMT processors

Simultaneous multithreaded (SMT) processors improve the instruction throughput by allowing fetching and running instructions from several threads simultaneously at a single cycle. With the number of running threads increasing, the performance of single thread degrades continually. One main reason of that is the competition of the limited cache resources among the threads and then causing the number of cache misses of the single thread increasing dramatically. In this paper, we propose an effective cache overlapping storage structure for SMT processors. The key idea is using several bits to encode the value of a data. If the value is a frequent value, it can be read or written through the encoding scheme and a table of frequent value. All the frequent values are encoded in a determined way. Then all the bytes that store frequent values in a cache line can be used to store other values at the same positions of a different line that maps to the same physical hardware. We find that this overlapping structure is useful for SMT processors to alleviate the cache competing problem of the threads and increasing the cache hit rates of them. Comparing with the other schemes, our scheme needs less hardware or organizes the cache in a different way, and can read and write without adding much latency. The implementation of our structure is simple. Execution-driven simulation results show that with a little hardware adding to the original cache our structure obtains a performance near to that obtained from a double sized one in an SMT processor.

[1]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[2]  Jun Yang,et al.  Frequent value compression in data caches , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[3]  Thomas M. Conte,et al.  Exploiting program redundancy to improve performance, cost and power consumption in embedded systems , 2000 .

[4]  Jun Yang,et al.  Frequent Value Locality and Value-Centric Data Cache Design , 2000, ASPLOS.

[5]  Dean M. Tullsen,et al.  Fellowship - Simulation And Modeling Of A Simultaneous Multithreading Processor , 1996, Int. CMG Conference.

[6]  Frank Vahid,et al.  Low static-power frequent-value data caches , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[7]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[8]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[9]  Jang-Soo Lee,et al.  Design and evaluation of a selective compressed memory system , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[10]  Antonio González,et al.  Value Compression to Reduce Power in Data Caches , 2003, Euro-Par.

[11]  Trevor Mudge,et al.  Low-Energy Data Cache Using Sign Compression and Cache Line Bisection , 2002 .

[12]  Krste Asanovic,et al.  Dynamic zero compression for cache energy reduction , 2000, MICRO 33.

[13]  Dean M. Tullsen,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).