Quantification of the Impact of Random Hardware Faults on Safety-Critical AI Applications: CNN-Based Traffic Sign Recognition Case Study

Nowadays, Artificial Intelligence (AI) rapidly enters almost every safety-critical domain, including the automotive industry. The next generation of functional safety standards has to define appropriate verification and validation techniques and propose adequate fault tolerance mechanisms. Several AI frameworks, such as TensorFlow by Google, have already proven to be effective and reliable platforms. However, similar to any other software, AI-based applications are prone to common random hardware faults, e.g., bit-flips which may occur in RAM or CPU registers and might lead to silent data corruption. Therefore, it is crucial to understand how different hardware faults affect the accuracy of AI applications. This paper introduces our new fault injection framework for TensorFlow and results of first experiments conducted on a Convolutional Neural Network (CNN) based traffic sign classifier. These results demonstrate the feasibility of the fault injection framework. In particular, they help to identify the most critical parts of a neural network under test.