A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck
暂无分享,去创建一个
Erik Brockmeyer | Francky Catthoor | Dimitrios Soudris | Minas Dasygenis | Adonios Thanailakis | Bart Durinck | A. Thanailakis | D. Soudris | F. Catthoor | E. Brockmeyer | M. Dasygenis | B. Durinck
[1] Gauthier Lafruit,et al. The Local Wavelet Transform: a memory-efficient, high-speed architecture optimized to a Region-Oriented Zero-Tree coder , 2000, Integr. Comput. Aided Eng..
[2] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[3] Daniel A. Connors,et al. Compiler-directed content-aware prefetching for dynamic data structures , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.
[4] Pen-Chung Yew,et al. : Data Prefetching In Shared Memory Multiprocessors , 1987, ICPP.
[5] Hugo De Man,et al. Platform Independent Data Transfer and Storage Exploration Illustrated on Parallel Cavity Detection Algorithm , 1999, PDPTA.
[6] Luc Van Gool,et al. One-shot active 3D shape acquisition , 1996, Proceedings of 13th International Conference on Pattern Recognition.
[7] Konstantinos Konstantinides,et al. Image and video compression standards , 1995 .
[8] Josep Torrellas,et al. Improving the data cache performance of multiprocessor operating systems , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[9] Erik Brockmeyer,et al. Data reuse analysis technique for software-controlled memory hierarchies , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.
[10] Henk Corporaal,et al. Layer assignment techniques for low energy in multi-layered memory organisations , 2003 .
[11] Francky Catthoor,et al. Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design , 1998 .
[12] Jason Fritts. Multi-level memory prefetching for media and stream processing , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.
[13] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[14] Alan Jay Smith,et al. Sequential Program Prefetching in Memory Hierarchies , 1978, Computer.
[15] Alexander V. Veidenbaum,et al. An Integrated Hardware/Software Data Prefetching Scheme for Shared-Memory Multiprocessors1 , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.
[16] Tien-Fu Chen,et al. Alternative implementations of hybrid branch predictors , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[17] Kathryn S. McKinley,et al. Guided region prefetching: a cooperative hardware/software approach , 2003, ISCA '03.
[18] Hugo De Man,et al. Minimizing the required memory bandwidth in VLSI system realizations , 1999, IEEE Trans. Very Large Scale Integr. Syst..
[19] David J. Sager,et al. The microarchitecture of the Pentium 4 processor , 2001 .
[20] Kenneth C. Yeager. The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.
[21] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .
[22] Xiaotong Zhuang,et al. A hardware-based cache pollution filtering mechanism for aggressive prefetches , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..
[23] Norman P. Jouppi,et al. Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .
[24] Todd C. Mowry,et al. Tolerating latency through software-controlled data prefetching , 1994 .
[25] Wei-Chung Hsu,et al. Data Prefetching On The HP PA-8000 , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[26] Derek Chiou,et al. Scheduler-Based prefetching for Multilevel Memories , 2001 .
[27] Frank Vahid,et al. Prefetching for improved bus wrapper performance in cores , 2002, TODE.
[28] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[29] Erik Brockmeyer,et al. Layer assignment techniques for low power in multi-layered memory organisations. , 2003 .
[30] Anoop Gupta,et al. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..
[31] Konstantinos Konstantinides,et al. Image and Video Compression Standards: Algorithms and Architectures , 1997 .
[32] Rita Cucchiara,et al. Improving Data Prefetching Efficacy in Multimedia Applications , 2003, Multimedia Tools and Applications.
[33] Th. Zahariadis,et al. A spiral search algorithm for fast estimation of block motion vectors , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).
[34] Chi-Keung Luk,et al. Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.
[35] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[36] Yale N. Patt,et al. An effective programmable prefetch engine for on-chip caches , 1995, MICRO 1995.
[37] Young Serk Shim,et al. A fast hierarchical motion vector estimation algorithm using mean pyramid , 1995, IEEE Trans. Circuits Syst. Video Technol..