Parallel Architecture With Resistive Crosspoint Array for Dictionary Learning Acceleration

This paper proposes a parallel architecture with resistive crosspoint array. The design of its two essential operations, read and write, is inspired by the biophysical behavior of a neural system, such as integrate-and-fire and local synapse weight update. The proposed hardware consists of an array with resistive random access memory (RRAM) and CMOS peripheral circuits, which perform matrix-vector multiplication and dictionary update in a fully parallel fashion, at the speed that is independent of the matrix dimension. The read and write circuits are implemented in 65 nm CMOS technology and verified together with an array of RRAM device model built from experimental data. The overall system exploits array-level parallelism and is demonstrated for accelerated dictionary learning tasks. As compared to software implementation running on a 8-core CPU, the proposed hardware achieves more than 3000 × speedup, enabling high-speed feature extraction on a single chip.

[1]  Kuk-Hwan Kim,et al.  Crossbar RRAM Arrays: Selector Device Requirements During Read Operation , 2014, IEEE Transactions on Electron Devices.

[2]  Shimeng Yu,et al.  A Low Energy Oxide‐Based Electronic Synaptic Device for Neuromorphic Visual Systems with Tolerance to Device Variation , 2013, Advanced materials.

[3]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[4]  Michael Robert DeWeese,et al.  A Sparse Coding Model with Synaptically Local Plasticity and Spiking Neurons Can Account for the Diverse Shapes of V1 Simple Cell Receptive Fields , 2011, PLoS Comput. Biol..

[5]  Hyunsang Hwang,et al.  Diode-less nano-scale ZrOx/HfOx RRAM device with excellent switching uniformity and reliability for high-density cross-point memory applications , 2010, 2010 International Electron Devices Meeting.

[6]  Shimeng Yu,et al.  Neurophysics-inspired parallel architecture with resistive crosspoint array for dictionary learning , 2014, 2014 IEEE Biomedical Circuits and Systems Conference (BioCAS) Proceedings.

[7]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[8]  Shimeng Yu,et al.  Parallel Programming of Resistive Cross-point Array for Synaptic Plasticity , 2014, BICA.

[9]  Cong Xu,et al.  Design implications of memristor-based RRAM cross-point structures , 2011, 2011 Design, Automation & Test in Europe.

[10]  J. Kim,et al.  Neuromorphic speech systems using advanced ReRAM-based synapse , 2013, 2013 IEEE International Electron Devices Meeting.

[11]  Degang Chen,et al.  Adjustable hysteresis CMOS Schmitt triggers , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[12]  Ahmad Ayatollahi,et al.  Efficient Hybrid CMOS-Nano Circuit Design for Spiking Neurons and Memristive Synapses with STDP , 2010, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[13]  Yong Liu,et al.  Specifications of Nanoscale Devices and Circuits for Neuromorphic Computational Systems , 2013, IEEE Transactions on Electron Devices.

[14]  Zhengya Zhang,et al.  A 6.67mW sparse coding ASIC enabling on-chip learning and inference , 2014, 2014 Symposium on VLSI Circuits Digest of Technical Papers.

[15]  Wei Yang Lu,et al.  Nanoscale memristor device as synapse in neuromorphic systems. , 2010, Nano letters.

[16]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[17]  L. Abbott,et al.  Competitive Hebbian learning through spike-timing-dependent synaptic plasticity , 2000, Nature Neuroscience.

[18]  Carver Mead,et al.  Analog VLSI and neural systems , 1989 .

[19]  Zhengya Zhang,et al.  Efficient Hardware Architecture for Sparse Coding , 2014, IEEE Transactions on Signal Processing.

[20]  H.-S. Philip Wong,et al.  Effect of Wordline/Bitline Scaling on the Performance, Energy Consumption, and Reliability of Cross-Point Memory Array , 2013, JETC.

[21]  Qingyang Li,et al.  Stochastic Coordinate Coding and Its Application for Drosophila Gene Expression Pattern Annotation , 2014, ArXiv.

[22]  Shimeng Yu,et al.  Metal–Oxide RRAM , 2012, Proceedings of the IEEE.

[23]  Shimeng Yu,et al.  Synaptic electronics: materials, devices and applications , 2013, Nanotechnology.

[24]  Pascal Frossard,et al.  Dictionary Learning , 2011, IEEE Signal Processing Magazine.

[25]  Fei Yuan,et al.  A high-speed differential CMOS Schmitt trigger with regenerative current feedback and adjustable hysteresis , 2010 .

[26]  L. F Abbott,et al.  Lapicque’s introduction of the integrate-and-fire model neuron (1907) , 1999, Brain Research Bulletin.

[27]  Tuo-Hung Hou,et al.  3D synaptic architecture with ultralow sub-10 fJ energy per spike for neuromorphic computation , 2014, 2014 IEEE International Electron Devices Meeting.

[28]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[29]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[30]  G. Bi,et al.  Synaptic Modifications in Cultured Hippocampal Neurons: Dependence on Spike Timing, Synaptic Strength, and Postsynaptic Cell Type , 1998, The Journal of Neuroscience.

[31]  Z. Wang,et al.  CMOS adjustable Schmitt triggers , 1991 .

[32]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.