Emerging Neural Workloads and Their Impact on Hardware

We consider existing and emerging neural workloads, and what hardware accelerators might be best suited for said workloads. We begin with a discussion of analog crossbar arrays, which are known to be well-suited for matrix-vector multiplication operations that are commonplace in existing neural network models such as convolutional neural networks (CNNs). We highlight candidate crosspoint devices, what device and materials challenges must be overcome for a given device to be employed in a crossbar array for a computationally interesting neural workload, and how circuit and algorithmic optimizations may be employed to mitigate undesirable characteristics from devices/materials. We then discuss two emerging neural workloads. We first consider machine learning models for one- and few-shot learning tasks (i.e., where a network can be trained with just one or a few, representative examples of a given class). Notably crossbar-based architectures can be used to accelerate said models. Hardware solutions based on content addressable memory arrays will also be discussed. We then consider machine learning models for recommendation systems. Recommendation models, an emerging class of machine learning models, employ distinct neural network architectures that operate of continuous and categorical input features which make hardware acceleration challenging. We will discuss the open research challenges and opportunities within this space.

[1]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[2]  Swagath Venkataramani,et al.  Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks , 2019, NeurIPS.

[3]  Michael T. Niemier,et al.  Ferroelectric FET Based In-Memory Computing for Few-Shot Learning , 2019, ACM Great Lakes Symposium on VLSI.

[4]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[5]  Yinghai Lu,et al.  Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.

[6]  Carole-Jean Wu,et al.  The Architectural Implications of Facebook's DNN-Based Personalized Recommendation , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[7]  Daan Wierstra,et al.  One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.

[8]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[9]  Douglas M. Bishop,et al.  Metal-oxide based, CMOS-compatible ECRAM for Deep Learning Accelerator , 2019, 2019 IEEE International Electron Devices Meeting (IEDM).

[10]  Tayfun Gokmen,et al.  Training Deep Convolutional Neural Networks with Resistive Cross-Point Devices , 2017, Front. Neurosci..

[11]  Sapan Agarwal,et al.  Li‐Ion Synaptic Transistor for Low Power Analog Computing , 2017, Advanced materials.

[12]  Tayfun Gokmen,et al.  The marriage of training and inference for scaled deep learning analog hardware , 2019, 2019 IEEE International Electron Devices Meeting (IEDM).

[13]  Shimeng Yu,et al.  Parallel Programming of Resistive Cross-point Array for Synaptic Plasticity , 2014, BICA.

[14]  Shimeng Yu,et al.  Exploiting Hybrid Precision for Training and Inference: A 2T-1FeFET Based Analog Synaptic Weight Cell , 2018, 2018 IEEE International Electron Devices Meeting (IEDM).

[15]  Mohsen Imani,et al.  Approximate Computing Using Multiple-Access Single-Charge Associative Memory , 2018, IEEE Transactions on Emerging Topics in Computing.

[16]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Aurko Roy,et al.  Learning to Remember Rare Events , 2017, ICLR.

[19]  Ryutaro Yasuhara,et al.  A 4M Synapses integrated Analog ReRAM based 66.5 TOPS/W Neural-Network Processor with Cell Current Controlled Writing and Flexible Network Architecture , 2018, 2018 IEEE Symposium on VLSI Technology.

[20]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[21]  Guorui Zhou,et al.  Deep Interest Network for Click-Through Rate Prediction , 2017, KDD.

[22]  Yacov Hel-Or,et al.  Ultra-Fast Similarity Search Using Ternary Content Addressable Memory , 2015, DaMoN.

[23]  Jianmo Ni,et al.  Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects , 2019, EMNLP.

[24]  Evangelos Eleftheriou,et al.  Accurate deep neural network inference using computational phase-change memory , 2019, Nature Communications.

[25]  Nikolas Ioannou,et al.  Deep learning acceleration based on in-memory computing , 2019, IBM J. Res. Dev..

[26]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[27]  Michael Niemier,et al.  Design of Hardware-Friendly Memory Enhanced Neural Networks , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[28]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[29]  Douglas M. Bishop,et al.  ECRAM as Scalable Synaptic Cell for High-Speed, Low-Power Neuromorphic Computing , 2018, 2018 IEEE International Electron Devices Meeting (IEDM).

[30]  Tayfun Gokmen,et al.  Training large-scale ANNs on simulated resistive crossbar arrays , 2019, ArXiv.

[31]  David M. Brooks,et al.  Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[32]  Dmitri Strukov,et al.  An Ultra-Low Energy Internally Analog, Externally Digital Vector-Matrix Multiplier Based on NOR Flash Memory Technology , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[33]  S. Kim,et al.  Reliability benefits of a metallic liner in confined PCM , 2018, 2018 IEEE International Reliability Physics Symposium (IRPS).

[34]  H. Ishiwara Proposal of Adaptive-Learning Neuron Circuits with Ferroelectric Analog-Memory Weights , 1993 .

[35]  Qing Wu,et al.  Efficient and self-adaptive in-situ learning in multilayer memristor neural networks , 2018, Nature Communications.

[36]  N Gong,et al.  Signal and noise extraction from analog memory elements for neuromorphic computing , 2018, Nature Communications.

[37]  Chang Zhou,et al.  Deep Interest Evolution Network for Click-Through Rate Prediction , 2018, AAAI.

[38]  H. Hwang,et al.  Improved Endurance of HfO2-Based Metal- Ferroelectric-Insulator-Silicon Structure by High-Pressure Hydrogen Annealing , 2019, IEEE Electron Device Letters.

[39]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[40]  Pritish Narayanan,et al.  AI hardware acceleration with analog memory: Microarchitectures for low energy at high speed , 2019, IBM J. Res. Dev..

[41]  Evangelos Eleftheriou,et al.  Mixed-precision architecture based on computational memory for training deep neural networks , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[42]  Geoffrey W. Burr,et al.  Reliability Challenges with Materials for Analog Computing , 2019, 2019 IEEE International Reliability Physics Symposium (IRPS).

[43]  Minsoo Rhu,et al.  TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning , 2019, MICRO.

[44]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[45]  I. K. Yoo,et al.  Ferroelectric materials for neuromorphic computing , 2019, APL Materials.

[46]  Pritish Narayanan,et al.  Experimental Demonstration and Tolerancing of a Large-Scale Neural Network (165 000 Synapses) Using Phase-Change Memory as the Synaptic Weight Element , 2014, IEEE Transactions on Electron Devices.

[47]  Gökmen Tayfun,et al.  Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices: Design Considerations , 2016, Front. Neurosci..

[48]  Tayfun Gokmen,et al.  Algorithm for Training Neural Networks on Resistive Device Arrays , 2019, Frontiers in Neuroscience.

[49]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[50]  James Zou,et al.  Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems , 2019, 2021 IEEE International Symposium on Information Theory (ISIT).

[51]  Karl Steinbuch,et al.  Die Lernmatrix , 2004, Kybernetik.

[52]  Pritish Narayanan,et al.  Equivalent-accuracy accelerated neural-network training using analogue memory , 2018, Nature.

[53]  A. Sebastian,et al.  8-bit Precision In-Memory Multiplication with Projected Phase-Change Memory , 2018, 2018 IEEE International Electron Devices Meeting (IEDM).

[54]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[55]  Swagath Venkataramani,et al.  A Scalable Multi-TeraOPS Core for AI Training and Inference , 2018, IEEE Solid-State Circuits Letters.

[56]  Daniel Brand,et al.  Training Deep Neural Networks with 8-bit Floating Point Numbers , 2018, NeurIPS.

[57]  Anand Raghunathan,et al.  X-MANN: A Crossbar based Architecture for Memory Augmented Neural Networks , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[58]  Swagath Venkataramani,et al.  Accurate and Efficient 2-bit Quantized Neural Networks , 2019, MLSys.

[59]  Martin D. Schatz,et al.  Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications , 2018, ArXiv.

[60]  Tayfun Gokmen,et al.  Zero-shifting Technique for Deep Neural Network Training on Resistive Cross-point Arrays , 2019, ArXiv.

[61]  Wilfried Haensch,et al.  Time-resolved Conductance in Electrochemical Systems for Neuromorphic Computing , 2018, Extended Abstracts of the 2018 International Conference on Solid State Devices and Materials.

[62]  Danny Hendler,et al.  Space-Efficient TCAM-Based Classification Using Gray Coding , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[63]  Siddharth Joshi,et al.  Author Correction: Ferroelectric ternary content-addressable memory for one-shot learning , 2019, Nature Electronics.

[64]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.