A survey of neural network accelerators

Machine-learning techniques have recently been proved to be successful in various domains, especially in emerging commercial applications. As a set of machine-learning techniques, artificial neural networks (ANNs), requiring considerable amount of computation and memory, are one of the most popular algorithms and have been applied in a broad range of applications such as speech recognition, face identification, natural language processing, ect. Conventionally, as a straightforward way, conventional CPUs and GPUs are energy-inefficient due to their excessive effort for flexibility. According to the aforementioned situation, in recent years, many researchers have proposed a number of neural network accelerators to achieve high performance and low power consumption. Thus, the main purpose of this literature is to briefly review recent related works, as well as the DianNao-family accelerators. In summary, this review can serve as a reference for hardware researchers in the area of neural networks.

[1]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[2]  Dimitrios Zissis,et al.  A cloud based architecture capable of perceiving and predicting multiple vessel behaviour , 2015, Appl. Soft Comput..

[3]  H. T. Kung Why systolic architectures? , 1982, Computer.

[4]  A. Iwata,et al.  An artificial neural network accelerator using general purpose 24 bit floating point digital signal processors , 1989, International 1989 Joint Conference on Neural Networks.

[5]  Yogesh Singh,et al.  Fault tolerance of feedforward artificial neural networks- a framework of study , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[6]  Jae-Jin Lee,et al.  Super-Systolic Array for 2D Convolution , 2006, TENCON 2006 - 2006 IEEE Region 10 Conference.

[7]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[8]  Tarek M. Taha,et al.  Scaling analysis of a neocortex inspired cognitive model on the Cray XD1 , 2008, The Journal of Supercomputing.

[9]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[10]  Babak Nadjar Araabi,et al.  Neural network stream processing core (NnSP) for embedded systems , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[11]  Sven Behnke,et al.  Accelerating Large-Scale Convolutional Neural Networks with Parallel Graphics Multiprocessors , 2010, ICANN.

[12]  Francisco Cardells-Tormo,et al.  Area-efficient 2-D shift-variant convolvers for FPGA-based digital image processing , 2005, IEEE Workshop on Signal Processing Systems Design and Implementation, 2005..

[13]  Hui Li,et al.  Evolutionary artificial neural networks: a review , 2011, Artificial Intelligence Review.

[14]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[15]  Yiran Chen,et al.  BSB training scheme implementation on memristor-based circuit , 2013, 2013 IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA).

[16]  F. Attneave,et al.  The Organization of Behavior: A Neuropsychological Theory , 1949 .

[17]  Hoi-Jun Yoo,et al.  A 201.4 GOPS 496 mW Real-Time Multi-Object Recognition Processor With Bio-Inspired Neural Perception Engine , 2009, IEEE Journal of Solid-State Circuits.

[18]  D. George,et al.  A hierarchical Bayesian model of invariant pattern recognition in the visual cortex , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[19]  Bruce A. Draper,et al.  Accelerated image processing on FPGAs , 2003, IEEE Trans. Image Process..

[20]  Quoc V. Le,et al.  Scalable learning for object detection with GPU hardware , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Francisco Cardells-Tormo,et al.  Area-efficient 2-D shift-variant convolvers for FPGA-based digital image processing , 2006, IEEE Transactions on Circuits and Systems II: Express Briefs.

[22]  Kar Yan Tam,et al.  Neural network models and the prediction of bank bankruptcy , 1991 .

[23]  Silke A.T. Weber,et al.  Social-Spider Optimization-Based Artificial Neural Networks Training and Its Applications for Parkinson's Disease Identification , 2014, 2014 IEEE 27th International Symposium on Computer-Based Medical Systems.

[24]  Qing Wu,et al.  Hardware realization of BSB recall function using memristor crossbar arrays , 2012, DAC Design Automation Conference 2012.

[25]  Jun-Seok Park,et al.  14.6 A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[26]  Yann LeCun,et al.  Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[27]  Kunle Olukotun,et al.  A highly scalable Restricted Boltzmann Machine FPGA implementation , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[28]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[29]  Berin Martini,et al.  A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[30]  Yann LeCun,et al.  CNP: An FPGA-based processor for Convolutional Networks , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[31]  Hava T. Siegelmann,et al.  Neural networks and analog computation - beyond the Turing limit , 1999, Progress in theoretical computer science.

[32]  Christoforos E. Kozyrakis,et al.  Convolution engine: balancing efficiency & flexibility in specialized computing , 2013, ISCA.

[33]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[34]  Narayanan Vijaykrishnan,et al.  Accelerating neuromorphic vision algorithms for recognition , 2012, DAC Design Automation Conference 2012.

[35]  Dias F. Morgado,et al.  Fault Tolerance of Artificial Neural Networks: an Open Discussion for a Global Model , 2010 .

[36]  Hao Jiang,et al.  RENO: A high-efficient reconfigurable neuromorphic computing accelerator design , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[37]  Johannes Schemmel,et al.  Wafer-scale integration of analog neural networks , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[38]  Vivienne Sze,et al.  14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[39]  C. C. Stearns,et al.  A reconfigurable 64-tap transversal filter , 1988, Proceedings of the IEEE 1988 Custom Integrated Circuits Conference.

[40]  Milton S. Boyd,et al.  Designing a neural network for forecasting financial and economic time series , 1996, Neurocomputing.

[41]  V. Hecht,et al.  An advanced programmable 2D-convolution chip for, real time image processing , 1991, 1991., IEEE International Sympoisum on Circuits and Systems.

[42]  Daniel L. Palumbo,et al.  Performance and fault-tolerance of neural networks for optimization , 1993, IEEE Trans. Neural Networks.

[43]  Srihari Cadambi,et al.  A dynamically configurable coprocessor for convolutional neural networks , 2010, ISCA.

[44]  David West,et al.  Neural network ensemble strategies for financial decision applications , 2005, Comput. Oper. Res..

[45]  Metin Nafi Gürcan,et al.  Coordinating the use of GPU and CPU for improving performance of compute intensive applications , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[46]  E. Michael Azoff,et al.  Neural Network Time Series: Forecasting of Financial Markets , 1994 .

[47]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[48]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[49]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Frederick G. Harmon,et al.  The control of a parallel hybrid-electric propulsion system for a small unmanned aerial vehicle using a CMAC neural network , 2005, Neural Networks.

[51]  Ah Chung Tsoi,et al.  FIR and IIR Synapses, a New Neural Network Architecture for Time Series Modeling , 1991, Neural Computation.

[52]  A. Ayatollahi,et al.  Implementation of biologically plausible spiking neural network models on the memristor crossbar-based CMOS/nano circuits , 2009, 2009 European Conference on Circuit Theory and Design.

[53]  Yu Cao,et al.  Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.

[54]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[55]  Berin Martini,et al.  Hardware accelerated convolutional neural networks for synthetic vision systems , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[56]  Jamal Salahaldeen Majeed Alneamy,et al.  Heart Disease Diagnosis Utilizing Hybrid Fuzzy Wavelet Neural Network and Teaching Learning Based Optimization Algorithm , 2014, Adv. Artif. Neural Syst..

[57]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[58]  R. Kunemund,et al.  Programmable 2D linear filter for video applications , 1989 .

[59]  Luca Maria Gambardella,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Flexible, High Performance Convolutional Neural Networks for Image Classification , 2022 .

[60]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[61]  Luis A. Plana,et al.  SpiNNaker: Mapping neural networks onto a massively-parallel chip multiprocessor , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[62]  Berin Martini,et al.  NeuFlow: A runtime reconfigurable dataflow processor for vision , 2011, CVPR 2011 WORKSHOPS.

[63]  Jake K. Aggarwal,et al.  Parallel 2-D Convolution on a Mesh Connected Array Processor , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[65]  René de Jesús Romero-Troncoso,et al.  Mlp neural network and on-line backpropagation learning implementation in a low-cost fpga , 2008, GLSVLSI '08.

[66]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[67]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[68]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[69]  Shefa A. Dawwd The multi 2D systolic design and implementation of Convolutional Neural Networks , 2013, 2013 IEEE 20th International Conference on Electronics, Circuits, and Systems (ICECS).

[70]  Mikko H. Lipasti,et al.  Automatic abstraction and fault tolerance in cortical microachitectures , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[71]  K. Steinhubl Design of Ion-Implanted MOSFET'S with Very Small Physical Dimensions , 1974 .

[72]  Srihari Cadambi,et al.  A Massively Parallel Coprocessor for Convolutional Neural Networks , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[73]  Hoi-Jun Yoo,et al.  4.6 A1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[74]  Hava T. Siegelmann,et al.  Analog computation via neural networks , 1993, [1993] The 2nd Israel Symposium on Theory and Computing Systems.

[75]  Dharmendra S. Modha,et al.  A digital neurosynaptic core using embedded crossbar memory with 45pJ per spike in 45nm , 2011, 2011 IEEE Custom Integrated Circuits Conference (CICC).

[76]  Henk Corporaal,et al.  Memory-centric accelerator design for Convolutional Neural Networks , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[77]  Giovanni Soda,et al.  Local Feedback Multilayered Networks , 1992, Neural Computation.

[78]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[79]  Xuehai Zhou,et al.  PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.

[80]  Basil Sh. Mahmood,et al.  A reconfigurable interconnected filter for face recognition based on convolution neural network , 2009, 2009 4th International Design and Test Workshop (IDT).

[81]  Alex Krizhevsky,et al.  One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.

[82]  Zoran Obradovic,et al.  A Neural Network-Based Method for Site-Specific Fertilization Recommendation , 2001 .

[83]  J. Knott The organization of behavior: A neuropsychological theory , 1951 .

[84]  A. Mishra,et al.  Drought forecasting using feed-forward recursive neural network , 2006 .

[85]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[86]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[87]  Olivier Temam,et al.  Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).

[88]  Keechul Jung,et al.  GPU implementation of neural networks , 2004, Pattern Recognit..

[89]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[90]  Vincent Vanhoucke,et al.  Improving the speed of neural networks on CPUs , 2011 .

[91]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[92]  Tao Wang,et al.  Deep learning with COTS HPC systems , 2013, ICML.

[93]  E. Culurciello,et al.  NeuFlow: Dataflow vision processing system-on-a-chip , 2012, 2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS).

[94]  Olivier Temam,et al.  A defect-tolerant accelerator for emerging high-performance applications , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).