Low-Cost Error Detection in Deep Neural Network Accelerators with Linear Algorithmic Checksums

The widespread adoption of deep neural networks in safety-critical systems necessitates the examination of the safety issues raised by hardware errors. The appropriateness of the concern is herein confirmed by evidencing the possible catastrophic impact of hardware bit errors on DNN accuracy. The consequent interest in fault tolerance methods that are comprehensive yet low-cost to match the margin requirements of consumer deep learning applications can be met through a rigorous exploration of the mathematical properties of the deep neural network computations. Our novel technique, Sanity-Check , allows error detection in fully-connected and convolutional layers through the use of linear algorithmic checksums. The purely software-based implementation of Sanity-Check facilitates the widespread adoption of our technique on a variety of off-the-shelf execution platforms while requiring no hardware modification. We further propose a dedicated hardware unit that seamlessly integrates with modern deep learning accelerators and eliminates the performance overhead of the software-based implementation at the cost of a negligible area and power budget in a DNN accelerator. Sanity-Check delivers perfect critical error coverage in our error injection experiments and offers a promising alternative for low-cost error detection in safety-critical deep neural network applications.