A Novel Convolution Computing Paradigm Based on NOR Flash Array With High Computing Speed and Energy Efficiency

Convolution is one of the key operations in signal processing and machine learning applications. In this paper, we propose a novel convolution computing paradigm based on the NOR Flash array (NFA) that is capable of executing the 2-D convolution computing in one clock cycle. In order to demonstrate the feasibility and efficiency of the proposed convolution computing paradigm, the feature extraction task on an image with the size of $20\times 20$ is executed using the NFA structure. We also prove the NOR Flash-driven convolution computing is capable of processing the image with a larger size. This paper presents a new approach to realize convolution computing with high speed and energy efficiency for the signal processing and convolution neural network.

[1]  Peng Huang,et al.  Optimized learning scheme for grayscale image recognition in a RRAM based analog neuromorphic system , 2015, 2015 IEEE International Electron Devices Meeting (IEDM).

[2]  Yao Zhang,et al.  Parallel Computing Experiences with CUDA , 2008, IEEE Micro.

[3]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[4]  Michael W. Hoffman,et al.  A 1000 frames/s Vision Chip Using Scalable Pixel-Neighborhood-Level Parallel Processing , 2017, IEEE Journal of Solid-State Circuits.

[5]  F. Merrikh Bayat,et al.  Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology , 2017, 2017 IEEE International Electron Devices Meeting (IEDM).

[6]  Guoqi Zhang,et al.  More than Moore: Creating High Value Micro/Nanoelectronics Systems , 2009 .

[7]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[8]  Guido Torelli,et al.  Technological and design constraints for multilevel flash memories , 1996, Proceedings of Third International Conference on Electronics, Circuits, and Systems.

[9]  Liyuan Liu,et al.  A 1000 fps Vision Chip Based on a Dynamically Reconfigurable Hybrid Architecture Comprising a PE Array Processor and Self-Organizing Map Neural Network , 2014, IEEE Journal of Solid-State Circuits.

[10]  Runze Han,et al.  A Novel Convolution Computing Paradigm Based on NOR Flash Array with High Computing Speed and Energy Efficient , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[11]  Pierangelo Terreni,et al.  A 10.6mW/0.8pJ power-scalable 1GS/s 4b ADC in 0.18 um CMOS with 5.8GHz ERBW , 2006 .

[12]  Wancheng Zhang,et al.  A Programmable Vision Chip Based on Multiple Levels of Parallel Processors , 2011, IEEE Journal of Solid-State Circuits.

[13]  Jiaming Zhang,et al.  Analogue signal and image processing with large memristor crossbars , 2017, Nature Electronics.

[14]  R. Stanley Williams What's Next? , 2017, Computing in Science & Engineering.

[15]  John G. Proakis,et al.  Digital Signal Processing: Principles, Algorithms, and Applications , 1992 .

[16]  Ligang Gao,et al.  Demonstration of Convolution Kernel Operation on Resistive Cross-Point Array , 2016, IEEE Electron Device Letters.

[17]  Piotr Dudek,et al.  A general-purpose processor-per-pixel analog SIMD vision chip , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[18]  Roberto Bez,et al.  Introduction to flash memory , 2003, Proc. IEEE.

[19]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[20]  J Joshua Yang,et al.  Memristive devices for computing. , 2013, Nature nanotechnology.

[21]  Wei D. Lu,et al.  Experimental Demonstration of Feature Extraction and Dimensionality Reduction Using Memristor Networks. , 2017, Nano letters.

[22]  Shimeng Yu,et al.  A Low Energy Oxide‐Based Electronic Synaptic Device for Neuromorphic Visual Systems with Tolerance to Device Variation , 2013, Advanced materials.

[23]  Bin Zhang,et al.  Hardware Implementation of Reconfigurable 1D Convolution , 2016, J. Signal Process. Syst..

[24]  Wei D. Lu,et al.  Sparse coding with memristor networks. , 2017, Nature nanotechnology.

[25]  Dave Bergeron,et al.  More than Moore , 2008, CICC.

[26]  Piotr Dudek,et al.  A SIMD Cellular Processor Array Vision Chip With Asynchronous Processing Capabilities , 2011, IEEE Transactions on Circuits and Systems I: Regular Papers.

[27]  Zhiwei Li,et al.  Binary neural network with 16 Mb RRAM macro chip for classification and online training , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).

[28]  Aiman Majid Nassar,et al.  The Internet of Things - A Survey , 2018, مؤتمرات الآداب والعلوم الانسانية والطبيعية.

[29]  Bernabé Linares-Barranco,et al.  An Event-Driven Multi-Kernel Convolution Processor Module for Event-Driven Vision Sensors , 2012, IEEE Journal of Solid-State Circuits.

[30]  Hideto Hidaka,et al.  40-nm Embedded Split-Gate MONOS (SG-MONOS) Flash Macros for Automotive With 160-MHz Random Access for Code and Endurance Over 10 M Cycles for Data at the Junction Temperature of 170 $^{\circ}$C , 2014, IEEE Journal of Solid-State Circuits.

[31]  H.-S. Philip Wong,et al.  Face classification using electronic synapses , 2017, Nature Communications.

[32]  Greg Brown,et al.  A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications , 2012, FPGA '12.

[33]  Sos S. Agaian,et al.  Efficient FPGA implementation of convolution , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[34]  Wei Lu,et al.  The future of electronics based on memristive systems , 2018, Nature Electronics.

[35]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[36]  H.-S. Philip Wong,et al.  Resistive RAM-Centric Computing: Design and Modeling Methodology , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[37]  Xinjie Guo,et al.  Redesigning commercial floating-gate memory for analog computing applications , 2014, 2015 IEEE International Symposium on Circuits and Systems (ISCAS).

[38]  Yang Song,et al.  HDTV1080p H.264/AVC Encoder Chip Design and Performance Analysis , 2009, IEEE Journal of Solid-State Circuits.

[39]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).