Future Computing Hardware for AI

Hardware has taken on a supporting role in the maturation and proliferation of narrow AI, but will take a leading role to enable the innovation and adoption of broad AI. The concurrent evolution of broad AI with purpose-built hardware will shift traditional balances between cloud and edge, structured and unstructured data, and training and inference. Heterogeneous system architectures are already being delivered where varied compute resources, including high-bandwidth CPUs, specialized AI accelerators, and high-performance networking are infused in each node to yield significant performance improvements. Looking to the future, we envision a roadmap of specialized technologies to accelerate AI, starting with heterogeneous digital von Neumann machines, exploring reduced-precision accelerator approaches, finding the limits of conventional device power-performance with analog AI devices, and finishing with quantum computing for AI.

[1]  Kristan Temme,et al.  Supervised learning with quantum-enhanced feature spaces , 2018, Nature.

[2]  Andrew S. Cassidy,et al.  Convolutional networks for fast, energy-efficient neuromorphic computing , 2016, Proceedings of the National Academy of Sciences.

[3]  Douglas M. Bishop,et al.  ECRAM as Scalable Synaptic Cell for High-Speed, Low-Power Neuromorphic Computing , 2018, 2018 IEEE International Electron Devices Meeting (IEDM).

[4]  Evangelos Eleftheriou,et al.  Mixed-precision architecture based on computational memory for training deep neural networks , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[5]  Evangelos Eleftheriou,et al.  Projected phase-change memory devices , 2015, Nature Communications.

[6]  Heiner Giefers,et al.  Mixed-precision in-memory computing , 2017, Nature Electronics.

[7]  C. Lam,et al.  A phase change memory cell with metallic surfactant layer as a resistance drift stabilizer , 2013, 2013 IEEE International Electron Devices Meeting.

[8]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[9]  Wei Zhang,et al.  AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training , 2017, AAAI.

[10]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[11]  Joel Silberman,et al.  A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference , 2018, 2018 IEEE Symposium on VLSI Circuits.

[12]  Manuel Le Gallo,et al.  Monatomic phase change memory , 2018, Nature Materials.

[13]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[14]  Pritish Narayanan,et al.  Neuromorphic computing using non-volatile memory , 2017 .

[15]  Gökmen Tayfun,et al.  Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices: Design Considerations , 2016, Front. Neurosci..

[16]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17]  William J. Dally,et al.  The GPU Computing Era , 2010, IEEE Micro.

[18]  Jay M. Gambetta,et al.  Building logical qubits in a superconducting quantum computing system , 2015, 1510.04375.

[19]  Pritish Narayanan,et al.  Equivalent-accuracy accelerated neural-network training using analogue memory , 2018, Nature.

[20]  Andrew W. Cross,et al.  Demonstration of quantum advantage in machine learning , 2015, npj Quantum Information.