Unreliable memory operation on a convolutional neural network processor

The evolution of convolutional neural networks (CNNs) into more complex forms of organization, with additional layers, larger convolutions and increasing connections, established the state-of-the-art in terms of accuracy errors for detection and classification challenges in images. Moreover, as they evolved to a point where Gigabytes of memory are required for their operation, we have reached a stage where it becomes fundamental to understand how their inference capabilities can be impaired if data elements somehow become corrupted in memory. This paper introduces fault-injection in these systems by simulating failing bit-cells in hardware memories brought on by relaxing the 100% reliable operation assumption. We analyze the behavior of these networks calculating inference under severe fault-injection rates and apply fault mitigation strategies to improve on the CNNs resilience. For the MNIST dataset, we show that 8x less memory is required for the feature maps memory space, and that in sub-100% reliable operation, fault-injection rates up to 10−1 (with most significant bit protection) can withstand only a 1% error probability degradation. Furthermore, considering the offload of the feature maps memory to an embedded dynamic RAM (eDRAM) system, using technology nodes from 65 down to 28 nm, up to 73∼80% improved power efficiency can be obtained.

[1]  David Blaauw,et al.  A fixed-point neural network for keyword detection on resource constrained hardware , 2015, 2015 IEEE Workshop on Signal Processing Systems (SiPS).

[2]  Wei Shen,et al.  Multi-scale Convolutional Neural Networks for Lung Nodule Classification , 2015, IPMI.

[3]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[4]  Wonyong Sung,et al.  Fixed point optimization of deep convolutional neural networks for object recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Minjae Lee,et al.  Fault tolerance analysis of digital feed-forward deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Yong Wang,et al.  Learning Traffic as Images: A Deep Convolutional Neural Network for Large-Scale Transportation Network Speed Prediction , 2017, Sensors.

[7]  Christoph Roth,et al.  Statistical data correction for unreliable memories , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[8]  Anantha Chandrakasan,et al.  Embedded power supply for low-power DSP , 1997, IEEE Trans. Very Large Scale Integr. Syst..

[9]  Christoph Roth,et al.  Data mapping for unreliable memories , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[10]  Naresh R. Shanbhag,et al.  Variation-Tolerant Architectures for Convolutional Neural Networks in the Near Threshold Voltage Regime , 2016, 2016 IEEE International Workshop on Signal Processing Systems (SiPS).

[11]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Andreas Peter Burg,et al.  Approximate computing with unreliable dynamic memories , 2015, 2015 IEEE 13th International New Circuits and Systems Conference (NEWCAS).

[15]  Mark Sandler,et al.  The Power of Sparsity in Convolutional Neural Networks , 2017, ArXiv.

[16]  Joseph R. Cavallaro,et al.  On the performance of LDPC and turbo decoder architectures with unreliable memories , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[17]  Rong Ge,et al.  Characterizing Power and Performance of GPU Memory Access , 2016, 2016 4th International Workshop on Energy Efficient Supercomputing (E2SC).

[18]  Chao Li,et al.  A Convolutional Neural Network Model for Online Medical Guidance , 2016, IEEE Access.