A novel access pattern-based multi-core memory architecture
暂无分享,去创建一个
[1] Guy Lemieux,et al. VENICE: A Compact Vector Processor for FPGA Applications , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.
[2] Eduard Ayguadé,et al. AMMC: Advanced Multi-Core Memory Controller , 2014, 2014 International Conference on Field-Programmable Technology (FPT).
[3] Uzi Vishkin,et al. Fpga-based prototype of a pram-on-chip processor , 2008, CF '08.
[4] Jun Shao,et al. A Burst Scheduling Access Reordering Mechanism , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[5] Eduard Ayguadé,et al. APMC: advanced pattern based memory controller (abstract only) , 2014, FPGA.
[6] Pedro C. Diniz,et al. Data search and reorganization using FPGAs: application to spatial pointer-based data structures , 2003, 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003..
[7] Zhen Fang,et al. The Impulse Memory Controller , 2001, IEEE Trans. Computers.
[8] Tassadaq Hussain,et al. PGC: a pattern-based graphics controller , 2014 .
[9] Rajeev Barua,et al. Dynamic allocation for scratch-pad memory using compile-time decisions , 2006, TECS.
[10] Mateo Valero,et al. Command vector memory systems: high performance at low cost , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[11] Guy Lemieux,et al. VEGAS: soft vector processor with scratchpad memory , 2011, FPGA '11.
[12] Mahmut T. Kandemir,et al. Organizing the last line of defense before hitting the memory wall for CMPs , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[13] Peter Marwedel,et al. Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).
[14] Christoforos E. Kozyrakis,et al. Overcoming the limitations of conventional vector processors , 2003, ISCA '03.
[15] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[16] Young-Hyun Jun,et al. A 1.2 V 12.8 GB/s 2 Gb Mobile Wide-I/O DRAM With 4 $\times$ 128 I/Os Using TSV Based Stacking , 2011, IEEE Journal of Solid-State Circuits.
[17] Rajeev Barua,et al. Heap data allocation to scratch-pad memory in embedded systems , 2005, J. Embed. Comput..
[18] Eduard Ayguadé Parra,et al. Reconfigurable memory controller with programmable pattern support , 2011, HIPEAC 2011.
[19] Cédric Augonnet,et al. Data-Aware Task Scheduling on Multi-accelerator Based Platforms , 2010, 2010 IEEE 16th International Conference on Parallel and Distributed Systems.
[20] Andrea C. Arpaci-Dusseau,et al. Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.
[21] Pen-Chung Yew,et al. The effectiveness of caches and data prefetch buffers in large-scale shared memory multiprocessors , 1987 .
[22] Ting Chen,et al. WCET centric data allocation to scratchpad memory , 2005, 26th IEEE International Real-Time Systems Symposium (RTSS'05).
[23] Peter Marwedel,et al. Reducing energy consumption by dynamic copying of instructions onto onchip memory , 2002, 15th International Symposium on System Synthesis, 2002..
[24] Norman P. Jouppi,et al. How useful are non-blocking loads, stream buffers and speculative execution in multiple issue processors? , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.
[25] Antonia Zhai,et al. A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[26] Sally A. McKee,et al. Reflections on the memory wall , 2004, CF '04.
[27] K. Saban. Xilinx Stacked Silicon Interconnect Technology Delivers Breakthrough FPGA Capacity , Bandwidth , and Power Efficiency , 2009 .
[28] Eduard Ayguadé,et al. PMSS: A programmable memory system and scheduler for complex memory patterns , 2014, J. Parallel Distributed Comput..
[29] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .
[30] Jean-François Deverge,et al. WCET-Directed Dynamic Scratchpad Memory Allocation of Data , 2007, 19th Euromicro Conference on Real-Time Systems (ECRTS'07).
[31] Fabio Pellizzer,et al. Non-Volatile semiconductor memories for nano-scale technology , 2010, IEEE International Conference on Nanotechnology.
[32] Erik Brunvand,et al. Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[33] Adrian Park,et al. Designing Modular Hardware Accelerators in C with ROCCC 2.0 , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.
[34] Andreas Moshovos,et al. Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.
[35] Kyoung-Rok Cho,et al. Memristor MOS Content Addressable Memory (MCAM): Hybrid Architecture for Future High Performance Search Engines , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[36] James Coole,et al. A Traversal Cache Framework for FPGA Acceleration of Pointer Data Structures: A Case Study on Barnes-Hut N-body Simulation , 2009, 2009 International Conference on Reconfigurable Computing and FPGAs.
[37] Kurt Keutzer,et al. An FPGA-based soft multiprocessor system for IPv4 packet forwarding , 2005, International Conference on Field Programmable Logic and Applications, 2005..
[38] Martin Burtscher,et al. Efficient emulation of hardware prefetchers via event-driven helper threading , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[39] Marc Tremblay,et al. Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor , 2009, ISCA '09.
[40] Alexandra Fedorova,et al. Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.
[41] Michael Weiss. Strip mining on SIMD architectures , 1991, ICS '91.
[42] Mor Harchol-Balter,et al. ATLAS : A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers , 2010 .
[43] Eduard Ayguadé,et al. Implementation of a Reverse Time Migration kernel using the HCE High Level Synthesis tool , 2011, 2011 International Conference on Field-Programmable Technology.
[44] Georgi Gaydadjiev,et al. SAMS multi-layout memory: providing multiple views of data to boost SIMD performance , 2010, ICS '10.
[45] Prateeksha Satyamoorthy,et al. MRAM for Shared Memory in GPGPUs , .
[46] Tom Feist,et al. Vivado Design Suite , 2012 .
[47] Eduard Ayguadé,et al. PPMC: Hardware scheduling and memory management support for multi accelerators , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).
[48] Wei Hu,et al. Hardware Assistant Scheduling for Synergistic Core Tasks on Embedded Heterogeneous Multi-core System ? , 2008 .
[49] William J. Dally,et al. Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[50] Alexander V. Veidenbaum,et al. An Integrated Hardware/Software Data Prefetching Scheme for Shared-Memory Multiprocessors1 , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.
[51] Eduard Ayguadé,et al. Stand-Alone Memory Controller for Graphics System , 2014, ARC.
[52] Eduard Ayguadé,et al. AMC: Advanced Multi-accelerator Controller , 2015, Parallel Comput..
[53] Ben H. H. Juurlink,et al. A Case for Hardware Task Management Support for the StarSS Programming Model , 2010, 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools.
[54] J. Gregory Steffan,et al. The microarchitecture of FPGA-based soft processors , 2005, CASES '05.
[55] Wei Wu,et al. On-Chip Memory System Optimization Design for the FT64 Scientific Stream Accelerator , 2008, IEEE Micro.
[56] Jonathan Rose,et al. Measuring the Gap Between FPGAs and ASICs , 2007, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[57] Wu-chun Feng,et al. A first look at integrated GPUs for green high-performance computing , 2010, Computer Science - Research and Development.
[58] Eduard Ayguadé,et al. Advanced Pattern based Memory Controller for FPGA based HPC applications , 2014, 2014 International Conference on High Performance Computing & Simulation (HPCS).
[59] Alexandru Nicolau,et al. Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration , 1998 .
[60] James Coole,et al. Traversal caches: a first step towards FPGA acceleration of pointer-based data structures , 2008, CODES+ISSS '08.
[61] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[62] Rakesh Krishnaiyer,et al. Optimizing software data prefetches with rotating registers , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[63] Eduard Ayguadé,et al. MAPC: Memory access pattern based controller , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).
[64] Philip J. Hatcher,et al. Data-Parallel Programming on MIMD Computers , 1991, IEEE Trans. Parallel Distributed Syst..
[65] Sven Nordholm,et al. FPGA multi-filter system for speech enhancement via multi-criteria optimization , 2014, Appl. Soft Comput..
[66] Purnendu Sinha,et al. A hardware accelerator for controlling access to multiple-unit resources in safety/time-critical systems , 2007, Int. J. Inf. Commun. Technol..
[67] Sally A. McKee,et al. Dynamic Access Ordering for Streamed Computations , 2000, IEEE Trans. Computers.
[68] David W. Nellans,et al. Handling the problems and opportunities posed by multiple on-chip memory controllers , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[69] Xu Chen,et al. Hardware Acceleration for Accurate Stereo Vision System using Mini-Census Adaptive Support Region , 2013 .
[70] Mateo Valero,et al. Vector architectures: past, present and future , 1998, ICS '98.
[71] Eduard Ayguadé,et al. PAMS: Pattern Aware Memory System for embedded systems , 2014, 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14).
[72] Shahram Latifi,et al. Future prospects of DRAM: emerging alternatives , 2012, Int. J. High Perform. Syst. Archit..
[73] Wei Wu,et al. FT64: Scientific Computing with Streams , 2007, HiPC.
[74] Eduard Ayguadé,et al. PPMC: A Programmable Pattern Based Memory Controller , 2012, ARC.
[75] Peng Liu,et al. An Efficient Architectural Design of Hardware Interface for Heterogeneous Multi-core System , 2011, NPC.
[76] Francisco J. Cazorla,et al. A dynamic scheduler for balancing HPC applications , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[77] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[78] Jonathan Rose,et al. VESPA: portable, scalable, and flexible FPGA-based vector processors , 2008, CASES '08.
[79] Guy Lemieux,et al. Vector Processing as a Soft Processor Accelerator , 2009, TRETS.
[80] Ken Kennedy,et al. Improving memory hierarchy performance for irregular applications , 1999, ICS '99.
[81] Nikil D. Dutt,et al. APEX: access pattern based memory architecture exploration , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).
[82] Alice C. Parker,et al. The high-level synthesis of digital systems , 1990, Proc. IEEE.
[83] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[84] Bernhard P. Wrobel,et al. Multiple View Geometry in Computer Vision , 2001 .
[85] Richard M. Russell,et al. The CRAY-1 computer system , 1978, CACM.
[86] Jason Cong,et al. CHARM: a composable heterogeneous accelerator-rich microprocessor , 2012, ISLPED '12.