3 D-Stacked Memory-Side Acceleration : Accelerator and System Design
暂无分享,去创建一个
Tze Meng Low | Qi Guo | Nikolaos Alachiotis | Berkin Akin | F. Sadi | Guang Xu | L. Pileggi | J. Hoe | F. Franchetti
[1] Maya Gokhale,et al. Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.
[2] Frederic T. Chong,et al. Active pages: a computation model for intelligent memory , 1998, ISCA.
[3] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[4] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[5] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[6] Steven Swanson,et al. Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.
[7] Bradford M. Beckmann,et al. The gem5 simulator , 2011, CARN.
[8] Jung Ho Ahn,et al. CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[9] J. Jeddeloh,et al. Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).
[10] Shirley Moore,et al. Measuring Energy and Power with PAPI , 2012, 2012 41st International Conference on Parallel Processing Workshops.
[11] Michael Bedford Taylor,et al. Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse , 2012, DAC Design Automation Conference 2012.
[12] Kenneth A. Ross,et al. Navigating big data with high-throughput, energy-efficient data partitioning , 2013, ISCA.
[13] Andrey Vladimirov. Multithreaded Transposition of Square Matrices with Common Code for Intel Xeon Processors and Intel Xeon Phi Coprocessors , 2013 .
[14] Luis Ceze,et al. Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.
[15] Thomas F. Wenisch,et al. Thin servers with smart pipes: designing SoC accelerators for memcached , 2013, ISCA.
[16] Mike Ignatowski,et al. High-level Programming Model Abstractions for Processing in Memory , 2013 .
[17] Babak Falsafi,et al. Meet the walkers accelerating index traversals for in-memory databases , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[18] Franz Franchetti,et al. Understanding the design space of DRAM-optimized hardware FFT accelerators , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.
[19] Ronald G. Dreslinski,et al. Sources of error in full-system simulation , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[20] Tianshi Chen,et al. ArchRanker: A ranking approach to design space exploration , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[21] Feifei Li,et al. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[22] Doru-Thom Popovici,et al. Algorithm/hardware co-optimized SAR image reconstruction with 3D-stacked logic in memory , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).
[23] Mike Ignatowski,et al. TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.
[24] Feifei Li,et al. Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads , 2014, IEEE Micro.
[25] Franz Franchetti,et al. HAMLeT: Hardware accelerated memory layout transform within 3D-stacked DRAM , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).