Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems
暂无分享,去创建一个
[1] Ken Mai,et al. The future of wires , 2001, Proc. IEEE.
[2] Geoffrey C. Fox,et al. MapReduce for Data Intensive Scientific Analyses , 2008, 2008 IEEE Fourth International Conference on eScience.
[3] Jung Ho Ahn,et al. NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[4] Gabriel H. Loh,et al. 3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.
[5] Jung Sunwoo,et al. BER Measurement of a 5.8-Gb/s/pin Unidirectional Differential I/O for DRAM Application With DIMM Channel , 2009, IEEE Journal of Solid-State Circuits.
[6] Gabriel H. Loh Nuwan Jayasena Mark H. Oskin Mark Nutter Da Ignatowski. A Processing-in-Memory Taxonomy and a Case for Studying Fixed-function PIM , 2013 .
[7] Jaeha Kim,et al. Memory-centric system interconnect design with Hybrid Memory Cubes , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[8] Doe Hyun Yoon,et al. Adaptive granularity memory systems: A tradeoff between storage efficiency and throughput , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[9] Franz Franchetti,et al. A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing , 2013, 2013 IEEE International 3D Systems Integration Conference (3DIC).
[10] Frederick A. Ware,et al. Improving Power and Data Efficiency with Threaded Memory Modules , 2006, 2006 International Conference on Computer Design.
[11] Jaejin Lee,et al. 25.2 A 1.2V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).
[12] Dave Brown,et al. Supplementary Material for An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing , 2013 .
[13] William J. Dally,et al. GPUs and the Future of Parallel Computing , 2011, IEEE Micro.
[14] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[15] Rudy Lauwereins,et al. ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix , 2003, FPL.
[16] Frederic T. Chong,et al. Active pages: a computation model for intelligent memory , 1998, ISCA.
[17] Dave Johnson,et al. 4.8 A 28nm x86 APU optimized for power and area efficiency , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.
[18] Christoforos E. Kozyrakis,et al. Towards energy-proportional datacenter memory with mobile DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[19] Eby G. Friedman,et al. AC-DIMM: associative computing with STT-MRAM , 2013, ISCA.
[20] Mike Ignatowski,et al. TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.
[21] Jichuan Chang,et al. BOOM: Enabling mobile memory based low-power server DIMMs , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[22] Zhao Zhang,et al. Mini-rank: Adaptive DRAM architecture for improving memory power efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[23] Engin Ipek,et al. A resistive TCAM accelerator for data-intensive computing , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[24] Antonia Zhai,et al. Triggered instructions: a control paradigm for spatially-programmed architectures , 2013, ISCA.
[25] Jung Ho Ahn,et al. Dynamic bandwidth scaling for embedded DSPs with 3D-stacked DRAM and wide I/Os , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[26] Aamer Jaleel,et al. Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[27] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[28] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[29] William J. Dally,et al. Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.
[30] Duncan G. Elliott,et al. Computational Ram: A Memory-simd Hybrid And Its Application To Dsp , 1992, 1992 Proceedings of the IEEE Custom Integrated Circuits Conference.
[31] Jae-Hyung Lee,et al. A 60nm 6Gb/s/pin GDDR5 Graphics DRAM with Multifaceted Clocking and ISI/SSN-Reduction Techniques , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.
[32] Christoforos E. Kozyrakis,et al. A case for intelligent RAM , 1997, IEEE Micro.
[33] Luca Benini,et al. A Logic-base Interconnect for Supporting Near Memory Computation in the Hybrid Memory Cube , 2014 .
[34] Onur Mutlu,et al. Adaptive-latency DRAM: Optimizing DRAM timing for the common-case , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[35] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[36] Michael M. Swift,et al. Efficient virtual memory for big memory servers , 2013, ISCA.
[37] Stéphan Jourdan,et al. Haswell: The Fourth-Generation Intel Core Processor , 2014, IEEE Micro.
[38] Feifei Li,et al. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[39] Feifei Li,et al. Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads , 2014, IEEE Micro.
[40] Raphael Landaverde,et al. An investigation of Unified Memory Access performance in CUDA , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).
[41] Serge J. Belongie,et al. SD-VBS: The San Diego Vision Benchmark Suite , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[42] Manu Awasthi. Rethinking Design Metrics for Datacenter DRAM , 2015, MEMSYS.
[43] Jung Ho Ahn,et al. The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing , 2013, TACO.
[44] Michael F. Deering,et al. FBRAM: a new form of memory optimized for 3D graphics , 1994, SIGGRAPH.
[45] David P. Luebke,et al. CUDA: Scalable parallel programming for high-performance scientific computing , 2008, 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.
[46] Doe Hyun Yoon,et al. Virtualized and flexible ECC for main memory , 2010, ASPLOS XV.
[47] David H. Albonesi,et al. ReMAP: A Reconfigurable Heterogeneous Multicore Architecture , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[48] Chun Chen,et al. The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.
[49] Ming Yang,et al. Sonic Millip3De: A massively parallel 3D-stacked accelerator for 3D ultrasound , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[50] Noah Treuhaft,et al. Intelligent RAM (IRAM): the industrial setting, applications, and architectures , 1997, Proceedings International Conference on Computer Design VLSI in Computers and Processors.
[51] Mike Ignatowski,et al. High-level Programming Model Abstractions for Processing in Memory , 2013 .
[52] Reum Oh,et al. Design technologies for a 1.2V 2.4Gb/s/pin high capacity DDR4 SDRAM with TSVs , 2014, 2014 Symposium on VLSI Circuits Digest of Technical Papers.
[53] Mor Harchol-Balter,et al. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[54] Shekhar Borkar,et al. Role of Interconnects in the Future of Computing , 2013, Journal of Lightwave Technology.
[55] Seung-Moon Yoo,et al. FlexRAM: Toward an advanced Intelligent Memory system , 1999, 2012 IEEE 30th International Conference on Computer Design (ICCD).
[56] Nam Sung Kim,et al. Reevaluating the latency claims of 3D stacked memories , 2013, 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC).