PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory
暂无分享,去创建一个
Yu Wang | Jishen Zhao | Shuangchen Li | Yongpan Liu | Tao Zhang | Yuan Xie | Ping Chi | Cong Xu | Yu Wang | Jishen Zhao | Ping Chi | Cong Xu | Shuangchen Li | Yuan Xie | Yongpan Liu | Tao Zhang | Conglei Xu | Zhang Tao
[1] Maya Gokhale,et al. Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.
[2] Anoop Gupta,et al. Operating system support for improving data locality on CC-NUMA compute servers , 1996, ASPLOS VII.
[3] Noah Treuhaft,et al. Scalable Processors in the Billion-Transistor Era: IRAM , 1997, Computer.
[4] Kunle Olukotun,et al. A Single Chip Multiprocessor Integrated with DRAM , 1997 .
[5] Duncan G. Elliott,et al. Computational RAM: The case for SIMD computing in memory , 1997 .
[6] Noah Treuhaft,et al. Intelligent RAM (IRAM): the industrial setting, applications, and architectures , 1997, Proceedings International Conference on Computer Design VLSI in Computers and Processors.
[7] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[8] Frederic T. Chong,et al. Active pages: a computation model for intelligent memory , 1998, ISCA.
[9] Chun Chen,et al. The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.
[10] Sanjeev Kumar,et al. Dynamic tracking of page miss ratio curve for memory management , 2004, ASPLOS XI.
[11] Yasar Becerikli,et al. Neural Network Implementation in Hardware Using FPGAs , 2006, ICONIP.
[12] Yann LeCun,et al. CNP: An FPGA-based processor for Convolutional Networks , 2009, 2009 International Conference on Field Programmable Logic and Applications.
[13] Vijayalakshmi Srinivasan,et al. Enhancing lifetime and security of PCM-based Main Memory with Start-Gap Wear Leveling , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[14] Hoi-Jun Yoo,et al. A 201.4 GOPS 496 mW Real-Time Multi-Object Recognition Processor With Bio-Inspired Neural Perception Engine , 2009, IEEE Journal of Solid-State Circuits.
[15] Moinuddin K. Qureshi,et al. Morphable memory system: a robust architecture for exploiting multi-level phase change memories , 2010, ISCA.
[16] Shimeng Yu,et al. Investigating the switching dynamics and multilevel capability of bipolar metal oxide resistive switching memory , 2011 .
[17] Dharmendra S. Modha,et al. A digital neurosynaptic core using embedded crossbar memory with 45pJ per spike in 45nm , 2011, 2011 IEEE Custom Integrated Circuits Conference (CICC).
[18] Chung Lam,et al. A Novel Reconfigurable Sensing Scheme for Variable Level Storage in Phase Change Memory , 2011, 2011 3rd IEEE International Memory Workshop (IMW).
[19] J. Thomas Pawlowski,et al. Hybrid memory cube (HMC) , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).
[20] Luca Maria Gambardella,et al. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Flexible, High Performance Convolutional Neural Networks for Image Classification , 2022 .
[21] Fabien Alibart,et al. Hybrid CMOS/nanodevice circuits for high throughput pattern matching applications , 2011, 2011 NASA/ESA Conference on Adaptive Hardware and Systems (AHS).
[22] Yong Liu,et al. A 45nm CMOS neuromorphic chip with a scalable architecture for learning in networks of spiking neurons , 2011, 2011 IEEE Custom Integrated Circuits Conference (CICC).
[23] William J. Dally,et al. GPUs and the Future of Parallel Computing , 2011, IEEE Micro.
[24] Kinam Kim,et al. A fast, high-endurance and scalable non-volatile memory device made from asymmetric Ta2O(5-x)/TaO(2-x) bilayer structures. , 2011, Nature materials.
[25] Engin Ipek,et al. A resistive TCAM accelerator for data-intensive computing , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[26] Jung Ho Ahn,et al. CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[27] Qing Wu,et al. Hardware realization of BSB recall function using memristor crossbar arrays , 2012, DAC Design Automation Conference 2012.
[28] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[29] Shimeng Yu,et al. Metal–Oxide RRAM , 2012, Proceedings of the IEEE.
[30] Shoji Sakamoto,et al. An 8Mb multi-layered cross-point ReRAM macro with 443MB/s write throughput , 2012, 2012 IEEE International Solid-State Circuits Conference.
[31] Cong Xu,et al. NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[32] J. Jeddeloh,et al. Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).
[33] W. Jang,et al. A study on low-power, nanosecond operation and multilevel bipolar resistance switching in Ti/ZrO2/Pt nonvolatile memory with 1T1R architecture , 2012 .
[34] Cong Xu,et al. Design trade-offs for high density cross-point resistive memory , 2012, ISLPED '12.
[35] Ligang Gao,et al. High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm , 2011, Nanotechnology.
[36] Andrew B. Kahng,et al. CACTI-IO: CACTI with off-chip power-area-timing models , 2012, 2012 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[37] Luis Ceze,et al. Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.
[38] D. Strukov,et al. A High Resolution Nonvolatile Analog Memory Ionic Devices , 2013 .
[39] Yu Wang,et al. Memristor-based approximated computation , 2013, International Symposium on Low Power Electronics and Design (ISLPED).
[40] Shimeng Yu,et al. 3D vertical RRAM - Scaling limit analysis and demonstration of 3D array operation , 2013, 2013 Symposium on VLSI Technology.
[41] Masahide Matsumoto,et al. A 130.7mm2 2-layer 32Gb ReRAM memory device in 24nm technology , 2013, 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers.
[42] Andrew S. Cassidy,et al. Cognitive computing systems: Algorithms and applications for networks of neurosynaptic cores , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).
[43] Norman P. Jouppi,et al. Understanding the trade-offs in multi-level cell ReRAM memory design , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[44] John Shalf,et al. Design of a large-scale storage-class RRAM system , 2013, ICS '13.
[45] Chung-Wei Hsu,et al. Self-rectifying bipolar TaOx/TiO2 RRAM with superior endurance over 1012 cycles for 3D high-density storage-class memory , 2013, 2013 Symposium on VLSI Technology.
[46] Rachata Ausavarungnirun,et al. RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[47] Wei Zhang,et al. Digital-assisted noise-eliminating training for memristor crossbar-based analog neuromorphic computing engine , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[48] Fabien Alibart,et al. Pattern classification by memristive crossbar circuits using ex situ and in situ training , 2013, Nature Communications.
[49] Yukio Hayakawa,et al. An 8 Mb Multi-Layered Cross-Point ReRAM Macro With 443 MB/s Write Throughput , 2012, IEEE Journal of Solid-State Circuits.
[50] Chris Yakopcic,et al. Exploring the design space of specialized multicore neural processors , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).
[51] Eby G. Friedman,et al. AC-DIMM: associative computing with STT-MRAM , 2013, ISCA.
[52] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[53] Yiran Chen,et al. BSB training scheme implementation on memristor-based circuit , 2013, 2013 IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA).
[54] Yiran Chen,et al. Reduction and IR-drop compensations techniques for reliable neuromorphic computing systems , 2014, 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[55] Steven Swanson,et al. Near-Data Processing: Insights from a MICRO-46 Workshop , 2014, IEEE Micro.
[56] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[57] Yu Wang,et al. Training itself: Mixed-signal training acceleration for memristor-based neural network , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).
[58] Luis Ceze,et al. General-purpose code acceleration with limited-precision analog computation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[59] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[60] Feifei Li,et al. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[61] Zvika Guz. Real-Time Analytics as the Killer Application for Processing-In-Memory , 2014 .
[62] Yoshua Bengio,et al. Low precision storage for deep learning , 2014 .
[63] G. W. Burr,et al. Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses), using phase-change memory as the synaptic weight element , 2015, 2014 IEEE International Electron Devices Meeting.
[64] Jaejin Lee,et al. 25.2 A 1.2V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).
[65] Cong Xu,et al. Architecting 3D vertical resistive memory for next-generation storage systems , 2014, 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[66] Jacques-Olivier Klein,et al. Spin-transfer torque magnetic memory as a stochastic memristive synapse , 2014, 2014 IEEE International Symposium on Circuits and Systems (ISCAS).
[67] Mike Ignatowski,et al. TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.
[68] Dmitri B. Strukov,et al. SpongeDirectory: Flexible sparse directories utilizing multi-level memristors , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[69] Tao Zhang,et al. Overcoming the challenges of crossbar resistive memory architectures , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[70] Jacob Nelson,et al. SNNAP: Approximate computing on programmable SoCs via neural acceleration , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[71] Yong Zhang,et al. A Reconfigurable Digital Neuromorphic Processor with Memristive Synaptic Crossbar for Cognitive Computing , 2015, ACM J. Emerg. Technol. Comput. Syst..
[72] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[73] Babak Falsafi,et al. Sort vs. Hash Join Revisited for Near-Memory Execution , 2015 .
[74] Stephen W. Keckler,et al. Page Placement Strategies for GPUs within Heterogeneous Memory Systems , 2015, ASPLOS.
[75] Xuehai Zhou,et al. PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.
[76] Franz Franchetti,et al. Data reorganization in memory using 3D-stacked DRAM , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[77] Tejas Karkhanis,et al. Active Memory Cube: A processing-in-memory architecture for exascale systems , 2015, IBM J. Res. Dev..
[78] Y. Leblebici,et al. Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: Comparative performance analysis (accuracy, speed, and power) , 2015, 2015 IEEE International Electron Devices Meeting (IEDM).
[79] Farnood Merrikh-Bayat,et al. Training and operation of an integrated neuromorphic network based on metal-oxide memristors , 2014, Nature.
[80] Peng Huang,et al. Optimized learning scheme for grayscale image recognition in a RRAM based analog neuromorphic system , 2015, 2015 IEEE International Electron Devices Meeting (IEDM).
[81] Andrew B. Kahng,et al. CACTI-IO: CACTI With OFF-Chip Power-Area-Timing Models , 2015, IEEE Trans. Very Large Scale Integr. Syst..
[82] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[83] Yu Wang,et al. RRAM-Based Analog Approximate Computing , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[84] Yu Wang,et al. SEAL-lab Technical Report – No . 2015-001 ( April 29 , 2016 ) Processing-in-Memory in ReRAM-based Main Memory , 2016 .
[85] Shaoli Liu,et al. Cambricon: An Instruction Set Architecture for Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[86] Cong Xu,et al. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[87] Miao Hu,et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).