Analyzing and Improving Fault Tolerance of Learning-Based Navigation Systems

Learning-based navigation systems are widely used in autonomous applications, such as robotics, unmanned vehicles and drones. Specialized hardware accelerators have been proposed for high-performance and energy-efficiency for such navigational tasks. However, transient and permanent faults are increasing in hardware systems and can catastrophically violate tasks safety. Meanwhile, traditional redundancy-based protection methods are challenging to deploy on resource-constrained edge applications. In this paper, we experimentally evaluate the resilience of navigation systems with respect to algorithms, fault models and data types from both RL training and inference. We further propose two efficient fault mitigation techniques that achieve $2 \times$ success rate and 39% quality-of-flight improvement in learning-based navigation systems.

[1]  Arijit Raychowdhury,et al.  iELAS: An ELAS-Based Energy-Efficient Accelerator for Real-Time Stereo Matching on FPGA Platform , 2021, 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS).

[2]  Ravishankar K. Iyer,et al.  ML-Based Fault Injection for Autonomous Vehicles: A Case for Bayesian Fault Injection , 2019, 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[3]  Bernard Girau,et al.  Fault and Error Tolerance in Neural Networks: A Review , 2017, IEEE Access.

[4]  Gu-Yeon Wei,et al.  14.3 A 28nm SoC with a 1.2GHz 568nJ/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IoT applications , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[5]  Luca Carlone,et al.  Navion: A 2-mW Fully Integrated Real-Time Visual-Inertial Odometry Accelerator for Autonomous Navigation of Nano Drones , 2018, IEEE Journal of Solid-State Circuits.

[6]  Srivatsan Krishnan,et al.  AutoSoC: Automating Algorithm-SOC Co-design for Aerial Robots , 2021, ArXiv.

[7]  Sarita V. Adve,et al.  PyTorchFI: A Runtime Perturbation Tool for DNNs , 2020, 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W).

[8]  Alexander Rush,et al.  AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference , 2019, ArXiv.

[9]  Yu Wang,et al.  Robotic Computing on FPGAs , 2021, Synthesis Lectures on Computer Architecture.

[10]  Arijit Raychowdhury,et al.  Autonomous Navigation via Deep Reinforcement Learning for Resource Constraint Edge Nodes Using Transfer Learning , 2019, IEEE Access.

[11]  Guanpeng Li,et al.  Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Wei-Jen Lee,et al.  A Dual Modular Redundancy Scheme for CPU–FPGA Platform-Based Systems , 2018, IEEE Transactions on Industry Applications.

[13]  Minyi Guo,et al.  Characterizing Perception Module Performance and Robustness in Production-Scale Autonomous Driving System , 2019, NPC.

[14]  Guido C. H. E. de Croon,et al.  Learning to Seek: Autonomous Source Seeking with Deep Reinforcement Learning Onboard a Nano Drone Microcontroller , 2019, ArXiv.

[15]  Srinivas Koppu,et al.  Fault Control Using Triple Modular Redundancy (TMR) , 2018 .

[16]  Arijit Raychowdhury,et al.  An Energy-Efficient Quad-Camera Visual System for Autonomous Machines on FPGA Platform , 2021, 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS).

[17]  Gu-Yeon Wei,et al.  The Sky Is Not the Limit: A Visual Performance Model for Cyber-Physical Co-Design in Autonomous Machines , 2020, IEEE Computer Architecture Letters.

[18]  Maximilian Lam,et al.  Quantized Reinforcement Learning (QUARL) , 2019, ArXiv.

[19]  Andrew Walenstein,et al.  A Survey of Anomaly Detection for Connected Vehicle Cybersecurity and Safety , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[20]  Aleksandra Faust,et al.  Air Learning: An AI Research Platform for Algorithm-Hardware Benchmarking of Autonomous Aerial Robots , 2019, ArXiv.

[21]  Zitao Chen,et al.  TensorFI: A Flexible Fault Injection Framework for TensorFlow Applications , 2020, 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE).

[22]  Gu-Yeon Wei,et al.  Ares: A framework for quantifying the resilience of deep neural networks , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[23]  John L. Gustafson,et al.  Adaptive Posit: Parameter aware numerical format for deep learning inference on the edge , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Osman S. Unsal,et al.  On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation , 2018, 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[25]  Gu-Yeon Wei,et al.  MAVFI: An End-to-End Fault Analysis Framework with Anomaly Detection and Recovery for Micro Aerial Vehicles , 2021, ArXiv.

[26]  Debjit Das Sarma,et al.  Computer and Redundancy Solution for the Full Self-Driving Computer , 2019, 2019 IEEE Hot Chips 31 Symposium (HCS).

[27]  Yu Wang,et al.  A Survey of FPGA-Based Robotic Computing , 2020, IEEE Circuits and Systems Magazine.

[28]  Arijit Raychowdhury,et al.  A 55-nm, 1.0–0.4V, 1.25-pJ/MAC Time-Domain Mixed-Signal Neuromorphic Accelerator With Stochastic Synapses for Reinforcement Learning in Autonomous Mobile Robots , 2019, IEEE Journal of Solid-State Circuits.

[29]  Nozomu Togawa,et al.  A bit-write reduction method based on error-correcting codes for non-volatile memories , 2015, The 20th Asia and South Pacific Design Automation Conference.

[30]  Arijit Raychowdhury,et al.  NavREn-Rl: Learning to fly in real environment via end-to-end deep reinforcement learning using monocular images , 2018, 2018 25th International Conference on Mechatronics and Machine Vision in Practice (M2VIP).

[31]  Alexander Rush,et al.  Algorithm-Hardware Co-Design of Adaptive Floating-Point Encodings for Resilient Deep Learning Inference , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[32]  Gu-Yeon Wei,et al.  Machine Learning-Based Automated Design Space Exploration for Autonomous Aerial Robots , 2021, ArXiv.