论文信息 - TaxoNN: A Light-Weight Accelerator for Deep Neural Network Training

TaxoNN: A Light-Weight Accelerator for Deep Neural Network Training

Emerging intelligent embedded devices rely on Deep Neural Networks (DNNs) to be able to interact with the real-world environment. This interaction comes with the ability to retrain DNNs, since environmental conditions change continuously in time. Stochastic Gradient Descent (SGD) is a widely used algorithm to train DNNs by optimizing the parameters over the training data iteratively. In this work, first we present a novel approach to add the training ability to a baseline DNN accelerator (inference only) by splitting the SGD algorithm into simple computational elements. Then, based on this heuristic approach we propose TaxoNN, a light-weight accelerator for DNN training. TaxoNN can easily tune the DNN weights by reusing the hardware resources used in the inference process using a time-multiplexing approach and low-bitwidth units. Our experimental results show that TaxoNN delivers, on average, 0.97% higher misclassification rate compared to a full-precision implementation. Moreover, TaxoNN provides 2.1$\times$ power saving and 1.65$\times$ area reduction over the state-of-the-art DNN training accelerator.

[1] Dara Rahmati,et al. SkippyNN: An Embedded Stochastic-Computing Accelerator for Convolutional Neural Networks , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[2] Shuang Wu,et al. Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.

[3] Hadi Esmaeilzadeh,et al. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network , 2017, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[4] Jiajun Li,et al. TNPU: an efficient accelerator architecture for training convolutional neural networks , 2019, ASP-DAC.

[5] Patrick Judd,et al. Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6] Xiaowei Li,et al. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[7] Mengjia Yan,et al. UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[8] Shaoli Liu,et al. Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[10] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11] Gu-Yeon Wei,et al. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[12] Tianshi Chen,et al. ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[13] Mehdi Modarressi,et al. Power-Efficient Accelerator Design for Neural Networks Using Computation Reuse , 2017, IEEE Computer Architecture Letters.

[14] Jose-Maria Arnau,et al. Computation Reuse in DNNs by Exploiting Input Similarity , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[15] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[16] Xin Wang,et al. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks , 2017, NIPS.

[17] Enrico Macii,et al. Dynamic Bit-width Reconfiguration for Energy-Efficient Deep Learning Hardware , 2018, ISLPED.

[18] Yiran Chen,et al. PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[19] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[20] Hyoukjun Kwon,et al. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.

[21] Kaushik Roy,et al. Design of power-efficient approximate multipliers for approximate artificial neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[22] Yu Wang,et al. TIME: A training-in-memory architecture for memristor-based deep neural networks , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[23] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[24] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).