Memory Latency : to Tolerate or to Reduce ?
暂无分享,去创建一个
Viktor K. Prasanna | Amol Bakshi | Jean-Luc Gaudiot | Chulho Shin | Wen-Yen Lin | Wonwoo Ro | Manil Makhija
[1] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.
[2] J. Tukey,et al. An algorithm for the machine calculation of complex Fourier series , 1965 .
[3] Viktor K. Prasanna,et al. Dynamic data layouts for cache-conscious factorization of DFT , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[4] Guang R. Gao,et al. On memory models and cache management for shared-memory multiprocessors , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.
[5] Nader Bagherzadeh,et al. Performance study of a multithreaded superscalar microprocessor , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[6] Noah Treuhaft,et al. Scalable Processors in the Billion-Transistor Era: IRAM , 1997, Computer.
[7] D. Burger,et al. Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[8] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[9] Richard Crisp,et al. Direct RAMbus technology: the new main memory standard , 1997, IEEE Micro.
[10] Steven K. Reinhardt,et al. A fully associative software-managed cache design , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[11] Katherine Yelick,et al. A Case for Intelligent DRAM: IRAM , 1998 .
[12] Sally A. McKee,et al. Access order and effective bandwidth for streams on a Direct Rambus memory , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[13] Lizy Kurian John,et al. Memory Latency Effects in Decoupled Architectures , 1994, IEEE Trans. Computers.
[14] V. Cuppu,et al. A performance comparison of contemporary DRAM architectures , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).
[15] Guang R. Gao,et al. A design study of the EARTH multiprocessor , 1995, PACT.
[16] Wolfgang K. Giloi,et al. MANNA: prototype of a distributed memory architecture with maximized sustained performance , 1996, Proceedings of 4th Euromicro Workshop on Parallel and Distributed Processing.
[17] W. Jalby,et al. To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts , 1993, Supercomputing '93.
[18] Keshav Pingali,et al. I-structures: data structures for parallel computing , 1986, Graph Reduction.
[19] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[20] Ali R. Hurson,et al. Effects of Multithreading on Cache Performance , 1999, IEEE Trans. Computers.
[21] Apoorv Srivastava,et al. A High-Performance, Hierarchical Decoupled Architecture , 1996 .
[22] Christoforos E. Kozyrakis,et al. A New Direction for Computer Architecture Research , 1998, Computer.
[23] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[24] Fong Pong,et al. Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[25] V. K. Prasanna-Kumar,et al. Perfect Latin squares and parallel array access , 1989, ISCA '89.
[26] Joe D. Warren,et al. The program dependence graph and its use in optimization , 1987, TOPL.
[27] Trevor Mudge,et al. DDR2 and Low Latency Variants , 2000 .