Survey of Machine Learning Accelerators

New machine learning accelerators are being announced and released each month for a variety of applications from speech recognition, video object detection, assisted driving, and many data center applications. This paper updates the survey of of AI accelerators and processors from last year's IEEE-HPEC paper. This paper collects and summarizes the current accelerators that have been publicly announced with performance and power consumption numbers. The performance and power values are plotted on a scatter graph and a number of dimensions and observations from the trends on this plot are discussed and analyzed. For instance, there are interesting trends in the plot regarding power consumption, numerical precision, and inference versus training. This year, there are many more announced accelerators that are implemented with many more architectures and technologies from vector engines, dataflow engines, neuromorphic designs, flash-based analog memory processing, and photonic-based processing.

[1]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[2]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[3]  Derek Chiou,et al.  The microsoft catapult project , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).

[4]  Clark S. Lindsey,et al.  Survey of neural network hardware , 1995, SPIE Defense + Commercial Sensing.

[5]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[6]  Jeremy Kepner,et al.  AI Enabling Technologies , 2019 .

[7]  Berin Martini,et al.  NeuFlow: A runtime reconfigurable dataflow processor for vision , 2011, CVPR 2011 WORKSHOPS.

[8]  Jeremy Kepner,et al.  Survey and Benchmarking of Machine Learning Accelerators , 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC).

[9]  Eitan Medina,et al.  Habana Labs Purpose-Built AI Inference and Training Processor Architectures: Scaling AI Training Systems Using Standard Ethernet With Gaudi Processor , 2020, IEEE Micro.

[10]  Zhen Li,et al.  A survey of neural network accelerators , 2016, Frontiers of Computer Science.

[11]  Wayne Luk,et al.  Deep Neural Network Approximation for Custom Hardware , 2019, ACM Comput. Surv..

[12]  Hong Wang,et al.  Programming Spiking Neural Networks on Intel’s Loihi , 2018, Computer.

[13]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[14]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[15]  William J. Dally,et al.  Domain-specific hardware accelerators , 2020, Commun. ACM.

[16]  Yu Wang,et al.  [DL] A Survey of FPGA-based Neural Network Inference Accelerators , 2019, ACM Trans. Reconfigurable Technol. Syst..

[17]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[18]  Ulrich Rueckert,et al.  Digital Neural Network Accelerators , 2020 .

[19]  Bernard Brezzo,et al.  TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[20]  Andrew S. Cassidy,et al.  Convolutional networks for fast, energy-efficient neuromorphic computing , 2016, Proceedings of the National Academy of Sciences.

[21]  Eugenio Culurciello,et al.  An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.

[22]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[23]  Mingguo Zhao,et al.  Towards artificial general intelligence with hybrid Tianjic chip architecture , 2019, Nature.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Luis A. Plana,et al.  SpiNNaker: Mapping neural networks onto a massively-parallel chip multiprocessor , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[26]  Mohamed S. Abdelfattah,et al.  DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[27]  Sparsh Mittal,et al.  A survey of FPGA-based accelerators for convolutional neural networks , 2018, Neural Computing and Applications.

[28]  David A. Patterson,et al.  A new golden age for computer architecture , 2019, Commun. ACM.

[29]  Yiran Chen,et al.  A Survey of Accelerator Architectures for Deep Neural Networks , 2020 .

[30]  Christoforos E. Kozyrakis,et al.  TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.

[31]  Catherine D. Schuman,et al.  A Survey of Neuromorphic Computing and Neural Networks in Hardware , 2017, ArXiv.

[32]  Joel Emer,et al.  Efficient Processing of Deep Neural Networks , 2020, Synthesis Lectures on Computer Architecture.

[33]  C. H. Kim,et al.  A 68 Parallel Row Access Neuromorphic Core with 22K Multi-Level Synapses Based on Logic-Compatible Embedded Flash Memory Technology , 2018, 2018 IEEE International Electron Devices Meeting (IEDM).

[34]  Marcelo A. C. Fernandes,et al.  A Survey and Taxonomy of FPGA-based Deep Learning Accelerators , 2019, J. Syst. Archit..

[35]  Indranil Saha,et al.  Artiflcial Neural Networks in Hardware: A Survey , 2008 .

[36]  David A. Patterson,et al.  A domain-specific architecture for deep neural networks , 2018, Commun. ACM.

[37]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[38]  Thomas N. Theis,et al.  The End of Moore's Law: A New Beginning for Information Technology , 2017, Computing in Science & Engineering.

[39]  Dhireesha Kudithipudi,et al.  Digital neuromorphic chips for deep learning inference: a comprehensive study , 2019, Optical Engineering + Applications.

[40]  Indranil Saha,et al.  journal homepage: www.elsevier.com/locate/neucom , 2022 .

[41]  Hao Wu,et al.  Mixed Precision Training , 2017, ICLR.

[42]  Debjit Das Sarma,et al.  Compute Solution for Tesla's Full Self-Driving Computer , 2020, IEEE Micro.

[43]  John Thompson,et al.  Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[44]  Dirk Englund,et al.  Deep learning with coherent nanophotonic circuits , 2017, 2017 Fifth Berkeley Symposium on Energy Efficient Electronic Systems & Steep Transistors Workshop (E3S).

[45]  Xuehai Zhou,et al.  PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.

[46]  M. A. Nugent,et al.  AHaH Computing–From Metastable Switches to Attractors to Machine Learning , 2014, PloS one.