Custom data layout for memory parallelism
暂无分享,去创建一个
[1] Maya Gokhale,et al. NAPA C: compiling for a hybrid RISC/FPGA architecture , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).
[2] Wayne Luk,et al. Memory Access Optimization and RAM Inference for Pipeline Vectorization , 1999, FPL.
[3] Vivek Sarkar,et al. Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.
[4] William J. Dally,et al. Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[5] Daniel M. Lavery,et al. Optimizations to prevent cache penalties for the Intel/spl reg/ Itanium/spl reg/ 2 processor , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[6] Francky Catthoor,et al. Fast and extensive system-level memory exploration for ATM applications , 1997, Proceedings. Tenth International Symposium on System Synthesis (Cat. No.97TB100114).
[7] William J. Dally,et al. Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.
[8] Frank Pfenning,et al. A type theory for memory allocation and data layout , 2003, POPL '03.
[9] Ken Kennedy,et al. Automatic Data Layout for High Performance Fortran , 1995, SC.
[10] Praveen K. Murthy,et al. Buffer merging—a powerful technique for reducing memory requirements of synchronous dataflow specifications , 2004, TODE.
[11] Saman P. Amarasinghe,et al. Maps: a compiler-managed memory system for Raw machines , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).
[12] Dror Rawitz,et al. The hardness of cache conscious data placement , 2002, POPL '02.
[13] Wei Li,et al. Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.
[14] Tom Keller,et al. Tera-Op Reliable Intelligently Adaptive Processing System (TRIPS) , 2004 .
[15] Santosh Pande,et al. A framework for parallelizing load/stores on embedded processors , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[16] Csaba Andras Moritz,et al. Parallelizing applications into silicon , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).
[17] Viktor K. Prasanna,et al. Latin Squares for Parallel Array Access , 1993, IEEE Trans. Parallel Distributed Syst..
[18] Monica S. Lam,et al. Automatic computation and data decomposition for multiprocessors , 1997 .
[19] Jean-Francois Collard,et al. Optimizations to prevent cache penalties for the Intel® Itanium® 2 Processor , 2003, CGO.
[20] Sally A. McKee,et al. Design of a parallel vector access unit for SDRAM memory systems , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[21] Pedro C. Diniz,et al. A compiler approach to fast hardware design space exploration in FPGA-based systems , 2002, PLDI '02.
[22] Manish Gupta,et al. Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers , 1992, IEEE Trans. Parallel Distributed Syst..
[23] Rastislav Bodík,et al. An efficient profile-analysis framework for data-layout optimizations , 2002, POPL '02.
[24] Herman Schmit,et al. Address generation for memories containing multiple arrays , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[25] Nikil D. Dutt,et al. Access pattern based local memory customization for low power embedded systems , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.
[26] Maurice V. Wilkes,et al. The memory wall and the CMOS end-point , 1995, CARN.
[27] Maya Gokhale,et al. Automatic allocation of arrays to memories in FPGA processors with multiple memory banks , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).
[28] Sally A. McKee,et al. Access ordering and memory-conscious cache utilization , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.
[29] Praveen K. Murthy,et al. A buffer merging technique for reducing memory requirements of synchronous dataflow specifications , 1999, Proceedings 12th International Symposium on System Synthesis.