On reducing load/store latencies of cache accesses
暂无分享,去创建一个
[1] Aneesh Aggarwal. Reducing latencies of pipelined cache accesses through set prediction , 2005, ICS '05.
[2] Eduard Ayguadé,et al. Dynamic memory instruction bypassing , 2003, ICS '03.
[3] Stéphan Jourdan,et al. Early load address resolution via register tracking , 2000, ISCA '00.
[4] Kunle Olukotun,et al. Multilevel Optimization of Pipelined Caches , 1997, IEEE Trans. Computers.
[5] David A. Patterson,et al. Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .
[6] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[7] Todd M. Austin,et al. SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.
[8] Gerry Kane,et al. MIPS R2000 RISC architecture , 1987 .
[9] Todd C. Mowry,et al. Tolerating latency in multiprocessors through compiler-inserted prefetching , 1998, TOCS.
[10] Chung-Ping Chung,et al. Early load: Hiding load latency in deep pipeline processor , 2008, 2008 13th Asia-Pacific Computer Systems Architecture Conference.
[11] Todd C. Mowry,et al. Architectural and compiler support for effective instruction prefetching: a cooperative approach , 2001, TOCS.
[12] A. Nicolau,et al. Reducing data cache energy consumption via cached load/store queue , 2003, Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03..
[13] Milo M. K. Martin,et al. Scalable store-load forwarding via store queue index prediction , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[14] Jia-Jhe Li,et al. Snug set-associative caches: Reducing leakage power of instruction and data caches with no performance penalties , 2007, TACO.
[15] Richard E. Kessler,et al. The Alpha 21264 microprocessor , 1999, IEEE Micro.
[17] Uri C. Weiser,et al. Correlated load-address predictors , 1999, ISCA.
[18] Craig B. Zilles,et al. Decomposing the load-store queue by function for power reduction and scalability , 2006, IBM J. Res. Dev..
[19] Donald Yeung,et al. A study of source-level compiler algorithms for automatic construction of pre-execution code , 2004, TOCS.
[20] Erik R. Altman,et al. Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture , 2002, MICRO 2002.
[21] Jia-Jhe Li,et al. Snug set-associative caches. Reducing leakage power while improving performance , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..
[22] David A. Patterson,et al. Computer Organization and Design, Fourth Edition, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) , 2008 .
[23] Narayanan Vijaykrishnan,et al. On load latency in low-power caches , 2003, ISLPED '03.
[24] Alexander V. Veidenbaum,et al. Reducing data cache energy consumption via cached load/store queue , 2003, ISLPED '03.
[25] Narayanan Vijaykrishnan,et al. Exploiting temporal loads for low latency and high bandwidth memory , 2005 .
[26] Chuanjun Zhang. Reducing cache misses through programmable decoders , 2008, TACO.
[27] Dionisios N. Pnevmatikatos,et al. Streamlining data cache access with fast address calculation , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[28] Dirk Grunwald,et al. Predictive sequential associative cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[29] Brad Calder,et al. Pointer cache assisted prefetching , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[30] Narayanan Vijaykrishnan,et al. Reducing non-deterministic loads in low-power caches via early cache set resolution , 2007, Microprocess. Microsystems.
[31] Donald J. Patterson,et al. Computer organization and design: the hardware-software interface (appendix a , 1993 .
[32] Chung-Ho Chen,et al. Microarchitecture support for improving the performance of load target prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[33] Lu Peng,et al. Signature buffer: bridging performance gap between registers and caches , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[34] T. N. Vijaykumar,et al. Reducing design complexity of the load/store queue , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..