DRISA: A DRAM-based Reconfigurable In-Situ Accelerator
暂无分享,去创建一个
Yuan Xie | Hongzhong Zheng | Dimin Niu | Shuangchen Li | Krishna T. Malladi | Bob Brennan | Shuangchen Li | Yuan Xie | Hongzhong Zheng | Dimin Niu | Bob Brennan
[1] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[2] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.
[3] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[4] Franz Franchetti,et al. Data reorganization in memory using 3D-stacked DRAM , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[5] Christoforos E. Kozyrakis,et al. Practical Near-Data Processing for In-Memory Analytics Frameworks , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[6] Dave Brown,et al. Supplementary Material for An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing , 2013 .
[7] Bruce Jacob,et al. Technology comparison for large last-level caches (L3Cs): Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[8] David A. Patterson,et al. Computer Organization and Design, Fifth Edition: The Hardware/Software Interface , 2013 .
[9] Ying Zhang,et al. Recurrent Neural Networks With Limited Numerical Precision , 2016, ArXiv.
[10] Lior Pachter,et al. Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities , 2005, PLoS Comput. Biol..
[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Eby G. Friedman,et al. AC-DIMM: associative computing with STT-MRAM , 2013, ISCA.
[13] Mike Ignatowski,et al. TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.
[14] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[15] Asit K. Mishra,et al. From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[16] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[17] Steven Swanson,et al. Near-Data Processing: Insights from a MICRO-46 Workshop , 2014, IEEE Micro.
[18] Mahmut T. Kandemir,et al. Scheduling techniques for GPU architectures with processing-in-memory capabilities , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).
[19] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[20] Sean Ahern. The Path to Exascale , 2012, High Performance Visualization.
[21] Gabriel H. Loh,et al. Thermal Feasibility of Die-Stacked Processing in Memory , 2014 .
[22] Jeffrey S. Vetter,et al. On the Path to Exascale , 2010, Int. J. Distributed Syst. Technol..
[23] Dong Han,et al. Cambricon: An Instruction Set Architecture for Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[24] Shengen Yan,et al. Deep Image: Scaling up Image Recognition , 2015, ArXiv.
[25] Andrew S. Cassidy,et al. A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.
[26] Tom W. Chen,et al. Assessing merged DRAM/Logic technology , 1999, Integr..
[27] Vivek Seshadri,et al. Simple DRAM and Virtual Memory Abstractions to Enable Highly Efficient Memory Systems , 2016, ArXiv.
[28] Kiyoung Choi,et al. AIM , 2016 .
[29] Wenguang Chen,et al. NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[30] Tze Meng Low,et al. Enabling portable energy efficiency with memory accelerated library , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[31] Tejas Karkhanis,et al. Active Memory Cube: A processing-in-memory architecture for exascale systems , 2015, IBM J. Res. Dev..
[32] 장훈,et al. [서평]「Computer Organization and Design, The Hardware/Software Interface」 , 1997 .
[33] Mahmut T. Kandemir,et al. Data Movement Aware Computation Partitioning , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[34] O Seongil,et al. Reducing memory access latency with asymmetric DRAM bank organizations , 2013, ISCA.
[35] Onur Mutlu,et al. Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).
[36] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..
[37] Kiyoung Choi,et al. Buffered compares: Excavating the hidden parallelism inside DRAM architectures with lightweight logic , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[38] Christoforos E. Kozyrakis,et al. A case for intelligent RAM , 1997, IEEE Micro.
[39] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[40] Jaejin Lee,et al. 25.2 A 1.2V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).
[41] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.
[42] Christina Delimitrou,et al. DRAF: A Low-Power DRAM-Based Reconfigurable Acceleration Fabric , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[43] Subramanian S. Iyer,et al. A 14 nm 1.1 Mb Embedded DRAM Macro With 1 ns Access , 2016, IEEE Journal of Solid-State Circuits.
[44] Onur Mutlu,et al. A case for exploiting subarray-level parallelism (SALP) in DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[45] Jung Ho Ahn,et al. Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[46] Onur Mutlu,et al. Fast Bulk Bitwise AND and OR in DRAM , 2015, IEEE Computer Architecture Letters.
[47] Umut Arslan,et al. 13.1 A 1Gb 2GHz embedded DRAM in 22nm tri-gate CMOS technology , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).
[48] Feng Lin,et al. DRAM Circuit Design: Fundamental and High-Speed Topics , 2007 .
[49] Tianshi Chen,et al. ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[50] Kevin Skadron,et al. HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[51] Tao Zhang,et al. Half-DRAM: A high-bandwidth and low-power DRAM architecture from the rethinking of fine-grained activation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[52] David Harris,et al. CMOS VLSI Design: A Circuits and Systems Perspective , 2004 .
[53] Sudhakar Yalamanchili,et al. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[54] J. Thomas Pawlowski,et al. Hybrid memory cube (HMC) , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).
[55] Tao Zhang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[56] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, NIPS.
[57] William J. Dally,et al. Scaling the Power Wall: A Path to Exascale , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[58] Pedro Trancoso. Moving to memoryland: in-memory computation for existing applications , 2015, Conf. Computing Frontiers.
[59] Sung-Mo Kang,et al. CMOS digital integrated circuits , 1995 .
[60] Mingyu Gao,et al. HRL: Efficient and flexible reconfigurable logic for near-data processing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[61] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.
[62] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[63] Eriko Nurvitadhi,et al. Accelerating Deep Convolutional Networks using low-precision and sparsity , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[64] Norman P. Jouppi,et al. CACTI 6.0: A Tool to Model Large Caches , 2009 .
[65] Jung Ho Ahn,et al. NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[66] Rachata Ausavarungnirun,et al. RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[67] Onur Mutlu,et al. Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM , 2016, ArXiv.
[68] Kiyoung Choi,et al. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[69] Florence March,et al. 2016 , 2016, Affair of the Heart.
[70] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[71] Juliane Junker,et al. Computer Organization And Design The Hardware Software Interface , 2016 .
[72] S. M. García,et al. 2014: , 2020, A Party for Lazarus.
[73] Thomas Vogelsang,et al. Understanding the Energy Consumption of Dynamic Random Access Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[74] Andrew S. Cassidy,et al. Cognitive computing systems: Algorithms and applications for networks of neurosynaptic cores , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).
[75] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[76] Jung Ho Ahn,et al. CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[77] Engin Ipek,et al. Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning , 2017 .
[78] Feifei Li,et al. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[79] Bin Liu,et al. Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[80] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[81] J. M. Park,et al. 20nm DRAM: A new beginning of another revolution , 2015, 2015 IEEE International Electron Devices Meeting (IEDM).
[82] Cong Xu,et al. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[83] Onur Mutlu,et al. LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory , 2017, IEEE Computer Architecture Letters.
[84] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[85] Yoshua Bengio,et al. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.
[86] Jung Ho Ahn,et al. Accelerating linked-list traversal through near-data processing , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).
[87] Dharmendra S. Modha,et al. A digital neurosynaptic core using embedded crossbar memory with 45pJ per spike in 45nm , 2011, 2011 IEEE Custom Integrated Circuits Conference (CICC).
[88] Dae-Hyun Kim,et al. ArchShield: architectural framework for assisting DRAM scaling by tolerating high error rates , 2013, ISCA.
[89] Feifei Li,et al. Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads , 2014, IEEE Micro.