NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules
暂无分享,去创建一个
[1] Duncan G. Elliott,et al. Computational Ram: A Memory-simd Hybrid And Its Application To Dsp , 1992, 1992 Proceedings of the IEEE Custom Integrated Circuits Conference.
[2] Michael F. Deering,et al. FBRAM: a new form of memory optimized for 3D graphics , 1994, SIGGRAPH.
[3] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[4] John Wawrzynek,et al. Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).
[5] Noah Treuhaft,et al. Intelligent RAM (IRAM): the industrial setting, applications, and architectures , 1997, Proceedings International Conference on Computer Design VLSI in Computers and Processors.
[6] Christoforos E. Kozyrakis,et al. A case for intelligent RAM , 1997, IEEE Micro.
[7] Frederic T. Chong,et al. Active pages: a computation model for intelligent memory , 1998, ISCA.
[8] Andreas Moshovos,et al. CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit , 2000, ISCA '00.
[9] William J. Dally,et al. Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.
[10] John Wawrzynek,et al. The Garp Architecture and C Compiler , 2000, Computer.
[11] Alvin R. Lebeck,et al. Power aware page allocation , 2000, SIGP.
[12] Reiner W. Hartenstein,et al. A decade of reconfigurable computing: a visionary retrospective , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.
[13] Reiner W. Hartenstein. Coarse grain reconfigurable architectures , 2001, Proceedings of the ASP-DAC 2001. Asia and South Pacific Design Automation Conference 2001 (Cat. No.01EX455).
[14] Chun Chen,et al. The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.
[15] Scott Hauck,et al. Reconfigurable computing: a survey of systems and software , 2002, CSUR.
[16] A. Tsai,et al. PipeRench: A virtualized programmable datapath in 0.18 micron technology , 2002, Proceedings of the IEEE 2002 Custom Integrated Circuits Conference (Cat. No.02CH37285).
[17] Rudy Lauwereins,et al. ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix , 2003, FPL.
[18] Won-Jong Lee,et al. A Scalable GPU Architecture based on Dynamically Reconfigurable Embedded Processor , 2005 .
[19] Josep Torrellas,et al. A Near-Memory Processor for Vector, Streaming and Bit Manipulation Workloads , 2005 .
[20] William J. Dally,et al. Scatter-add in data parallel architectures , 2005, 11th International Symposium on High-Performance Computer Architecture.
[21] Seth Copen Goldstein,et al. Tartan: evaluating spatial computation for whole program execution , 2006, ASPLOS XII.
[22] Soo-In Cho,et al. A 512-Mb DDR3 SDRAM prototype with CIO minimization and self-calibration techniques , 2006, VLSIC 2006.
[23] A Survey of Multi-Core Coarse-Grained Reconfigurable Arrays for Embedded Applications , 2007 .
[24] Gabriel H. Loh,et al. 3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.
[25] Geoffrey C. Fox,et al. MapReduce for Data Intensive Scientific Analyses , 2008, 2008 IEEE Fourth International Conference on eScience.
[26] Hsien-Hsin S. Lee,et al. POD: A 3D-Integrated Broad-Purpose Acceleration Layer , 2008, IEEE Micro.
[27] Serge J. Belongie,et al. SD-VBS: The San Diego Vision Benchmark Suite , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[28] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[29] So-Ra Kim,et al. 8Gb 3D DDR3 DRAM using through-silicon-via technology , 2009, 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.
[30] G. Edward Suh,et al. Flexible and Efficient Instruction-Grained Run-Time Monitoring Using On-Chip Reconfigurable Fabric , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[31] Young-Hyun Jun,et al. 8 Gb 3-D DDR3 DRAM Using Through-Silicon-Via Technology , 2009, IEEE Journal of Solid-State Circuits.
[32] David H. Albonesi,et al. ReMAP: A Reconfigurable Heterogeneous Multicore Architecture , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[33] Christoforos E. Kozyrakis,et al. Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.
[34] Bradford M. Beckmann,et al. The gem5 simulator , 2011, CARN.
[35] Zhen Fang,et al. Active memory controller , 2012, The Journal of Supercomputing.
[36] Karthikeyan Sankaralingam,et al. Dynamically Specialized Datapaths for energy efficient computing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[37] Young-Hyun Jun,et al. A 1.2V 12.8GB/s 2Gb mobile Wide-I/O DRAM with 4×128 I/Os using TSV-based stacking , 2011, 2011 IEEE International Solid-State Circuits Conference.
[38] Tomonori Sekiguchi,et al. 1-Tbyte/s 1-Gbit DRAM Architecture Using 3-D Interconnect for High-Throughput Computing , 2011, IEEE Journal of Solid-State Circuits.
[39] J. Thomas Pawlowski,et al. Hybrid memory cube (HMC) , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).
[40] Michael C. Huang,et al. Efficient data streaming with on-chip accelerators: Opportunities and challenges , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[41] William J. Dally,et al. GPUs and the Future of Parallel Computing , 2011, IEEE Micro.
[42] Engin Ipek,et al. A resistive TCAM accelerator for data-intensive computing , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[43] Pankaj Shailendra Gode,et al. Function inlining and loop unrolling for loop acceleration in reconfigurable processors , 2012, CASES '12.
[44] Hsien-Hsin S. Lee,et al. 3D-MAPS: 3D Massively parallel processor with stacked memory , 2012, 2012 IEEE International Solid-State Circuits Conference.
[45] Seung-Moon Yoo,et al. FlexRAM: Toward an advanced Intelligent Memory system , 1999, 2012 IEEE 30th International Conference on Computer Design (ICCD).
[46] Jong-Ho Kang,et al. A 1.2V 23nm 6F2 4Gb DDR3 SDRAM with local-bitline sense amplifier, hybrid LIO sense amplifier and dummy-less array architecture , 2012, 2012 IEEE International Solid-State Circuits Conference.
[47] David Blaauw,et al. Centip3De: A 64-Core, 3D Stacked Near-Threshold System , 2012, IEEE Micro.
[48] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[49] Karthikeyan Sankaralingam,et al. DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing , 2012, IEEE Micro.
[50] Young Choi,et al. A 1.2V 30nm 3.2Gb/s/pin 4Gb DDR4 SDRAM with dual-error detection and PVT-tolerant data-fetch scheme , 2012, 2012 IEEE International Solid-State Circuits Conference.
[51] Mario Konijnenburg,et al. ULP-SRP: Ultra low power Samsung Reconfigurable Processor for biomedical applications , 2012, 2012 International Conference on Field-Programmable Technology.
[52] Young-Hyun Jun,et al. A 1.2 V 12.8 GB/s 2 Gb Mobile Wide-I/O DRAM With 4 $\times$ 128 I/Os Using TSV Based Stacking , 2011, IEEE Journal of Solid-State Circuits.
[53] Paolo Ienne,et al. Elastic CGRAs , 2013, FPGA '13.
[54] Gabriel H. Loh Nuwan Jayasena Mark H. Oskin Mark Nutter Da Ignatowski. A Processing-in-Memory Taxonomy and a Case for Studying Fixed-function PIM , 2013 .
[55] Michael M. Swift,et al. Efficient virtual memory for big memory servers , 2013, ISCA.
[56] José F. Martínez,et al. Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems , 2013, ISCA.
[57] Jun-Seok Park,et al. A 1.2 V 30 nm 3.2 Gb/s/pin 4 Gb DDR4 SDRAM With Dual-Error Detection and PVT-Tolerant Data-Fetch Scheme , 2012, IEEE Journal of Solid-State Circuits.
[58] Shekhar Borkar,et al. Role of Interconnects in the Future of Computing , 2013, Journal of Lightwave Technology.
[59] O Seongil,et al. Reducing memory access latency with asymmetric DRAM bank organizations , 2013, ISCA.
[60] Franz Franchetti,et al. A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing , 2013, 2013 IEEE International 3D Systems Integration Conference (3DIC).
[61] Antonia Zhai,et al. Triggered instructions: a control paradigm for spatially-programmed architectures , 2013, ISCA.
[62] Karthikeyan Sankaralingam,et al. A general constraint-centric scheduling framework for spatial architectures , 2013, PLDI.
[63] Mike Ignatowski,et al. High-level Programming Model Abstractions for Processing in Memory , 2013 .
[64] Eby G. Friedman,et al. AC-DIMM: associative computing with STT-MRAM , 2013, ISCA.
[65] Jung Ho Ahn,et al. The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing , 2013, TACO.
[66] Ming Yang,et al. Sonic Millip3De: A massively parallel 3D-stacked accelerator for 3D ultrasound , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[67] Rajeev Balasubramonian,et al. Quantifying the relationship between the power delivery network and architectural policies in a 3D-stacked memory device , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[68] O Seongil,et al. Row-buffer decoupling: A case for low-latency DRAM microarchitecture , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[69] Yoav Etsion,et al. Single-graph multiple flows: Energy efficient design alternative for GPGPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[70] Mike Ignatowski,et al. TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.
[71] Nam Sung Kim,et al. Energy-efficient reconfigurable cache architectures for accelerator-enabled embedded systems , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[72] Feifei Li,et al. Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads , 2014, IEEE Micro.
[73] Jung Ho Ahn,et al. DRAMA: An Architecture for Accelerated Processing Near Memory , 2015, IEEE Computer Architecture Letters.