Reducing the Traffic of Loop-Based Programs Using a Prefetch Processor
暂无分享,去创建一个
[1] D. Burger,et al. Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[2] Compilation Techniques,et al. Parallel architectures and compilation techniques , 1995 .
[3] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[4] Joseph A. Fisher,et al. Very Long Instruction Word architectures and the ELI-512 , 1983, ISCA '83.
[5] Apoorv Srivastava,et al. A High-Performance, Hierarchical Decoupled Architecture , 1996 .
[6] Lizy Kurian John,et al. Memory Latency Effects in Decoupled Architectures , 1994, IEEE Trans. Computers.
[7] Tien-Fu Chen,et al. Alternative implementations of hybrid branch predictors , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[8] Alan R. Jones,et al. Fast Fourier Transform , 1970, SIGP.
[9] Todd C. Mowry,et al. Tolerating latency through software-controlled data prefetching , 1994 .
[10] Gary S. Tyson,et al. A study of single-chip processor/cache organizations for large numbers of transistors , 1994, ISCA '94.
[11] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[12] Ian Watson,et al. Decoupled pre-fetching for distributed shared memory , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.
[13] R.H. Dennard,et al. Design Of Ion-implanted MOSFET's with Very Small Physical Dimensions , 1974, Proceedings of the IEEE.
[14] Yale N. Patt,et al. An effective programmable prefetch engine for on-chip caches , 1995, MICRO 1995.
[15] A. Gupta,et al. Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results , 1989, ISCA '89.
[16] Daeyeon Park,et al. Improving the effectiveness of software prefetching with adaptive executions , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.
[17] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[18] D. Munson. Circuits and systems , 1982, Proceedings of the IEEE.
[19] Michael E. Wolf,et al. Improving locality and parallelism in nested loops , 1992 .