HiMA: A Fast and Scalable History-based Memory Access Engine for Differentiable Neural Computer

Memory-augmented neural networks (MANNs) provide better inference performance in many tasks with the help of an external memory. The recently developed differentiable neural computer (DNC) is a MANN that has been shown to outperform in representing complicated data structures and learning long-term dependencies. DNC’s higher performance is derived from new history-based attention mechanisms in addition to the previously used content-based attention mechanisms. History-based mechanisms require a variety of new compute primitives and state memories, which are not supported by existing neural network (NN) or MANN accelerators. We present HiMA, a tiled, history-based memory access engine with distributed memories in tiles. HiMA incorporates a multi-mode network-on-chip (NoC) to reduce the communication latency and improve scalability. An optimal submatrix-wise memory partition strategy is applied to reduce the amount of NoC traffic; and a two-stage usage sort method leverages distributed tiles to improve computation speed. To make HiMA fundamentally scalable, we create a distributed version of DNC called DNC-D to allow almost all memory operations to be applied to local memories with trainable weighted summation to produce the global memory output. Two approximation techniques, usage skimming and softmax approximation, are proposed to further enhance hardware efficiency. HiMA prototypes are created in RTL and synthesized in a 40nm technology. By simulations, HiMA running DNC and DNC-D demonstrates 6.47 × and 39.1 × higher speed, 22.8 × and 164.3 × better area efficiency, and 6.1 × and 61.2 × better energy efficiency over the state-of-the-art MANN accelerator. Compared to an Nvidia 3080Ti GPU, HiMA demonstrates speedup by up to 437 × and 2,646 × when running DNC and DNC-D, respectively.

[1]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[2]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[3]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[4]  Sungroh Yoon,et al.  Energy-Efficient Inference Accelerator for Memory-Augmented Neural Networks on an FPGA , 2018, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5]  Dipankar Das,et al.  Manna: An Accelerator for Memory-Augmented Neural Networks , 2019, MICRO.

[6]  Vivek Sarkar,et al.  Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach , 2018, MICRO.

[7]  Xiaolei Zhu,et al.  Hardware Implementation of Softmax Function Based on Piecewise LUT , 2019, 2019 IEEE International Workshop on Future Computing (IWOFC.

[8]  Manoj Alwani,et al.  Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[10]  Shalabh Bhatnagar,et al.  Memory-Based Deep Reinforcement Learning for Obstacle Avoidance in UAV With Limited Environment Knowledge , 2018, IEEE Transactions on Intelligent Transportation Systems.

[11]  Anand Raghunathan,et al.  X-MANN: A Crossbar based Architecture for Memory Augmented Neural Networks , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[12]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[13]  W. Dally,et al.  SCNN , 2017 .

[14]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[15]  Hakem Beitollahi,et al.  RTHS: A Low-Cost High-Performance Real-Time Hardware Sorter, Using a Multidimensional Sorting Algorithm , 2019, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[16]  Tamim Asfour,et al.  Model-Based Reinforcement Learning via Meta-Policy Optimization , 2018, CoRL.

[17]  Hari Angepat,et al.  Serving DNNs in Real Time at Datacenter Scale with Project Brainwave , 2018, IEEE Micro.

[18]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[19]  Kenji Kise,et al.  High-Performance Hardware Merge Sorter , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[20]  Narayanan Vijaykrishnan,et al.  FARM: A Flexible Accelerator for Recurrent and Memory Augmented Neural Networks , 2020, Journal of Signal Processing Systems.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  V. Paliouras,et al.  Simplified Hardware Implementation of the Softmax Activation Function , 2019, 2019 8th International Conference on Modern Circuits and Systems Technologies (MOCAST).

[23]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[24]  Liangzhen Lai,et al.  HERALD: Optimizing Heterogeneous DNN Accelerators for Edge Devices , 2019, ArXiv.

[25]  Amos J. Storkey,et al.  Data Augmentation Generative Adversarial Networks , 2017, ICLR 2018.

[26]  Jaewon Lee,et al.  MnnFast: A Fast and Scalable System Architecture for Memory-Augmented Neural Networks , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[27]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[28]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[29]  Hyoukjun Kwon,et al.  MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.

[30]  William J. Dally,et al.  Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture , 2019, MICRO.

[31]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[32]  Olivier Pietquin,et al.  Observational Learning by Reinforcement Learning , 2017, AAMAS.

[33]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[34]  Honglak Lee,et al.  Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[35]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.