MAVFI: An End-to-End Fault Analysis Framework with Anomaly Detection and Recovery for Micro Aerial Vehicles

Reliability and safety are critical in autonomous machine services, such as autonomous vehicles and aerial drones. In this paper, we first present an open-source Micro Aerial Vehicles (MAVs) reliability analysis framework, MAVFI, to characterize transient fault’s impacts on the end-to-end flight metrics, e.g., flight time, success rate. Based on our framework, it is observed that the end-to-end fault tolerance analysis is essential for characterizing system reliability. We demonstrate the planning and control stages are more vulnerable to transient faults than the visual perception stage in the common “Perception-Planning-Control (PPC)” compute pipeline. Furthermore, to improve the reliability of the MAV system, we propose two low overhead anomaly-based transient fault detection and recovery schemes based on Gaussian statistical models and autoencoder neural networks. We validate our anomaly fault protection schemes with a variety of simulated photo-realistic environments on both Intel i9 CPU and ARM Cortex-A57 on Nvidia TX2 platform. It is demonstrated that the autoencoder-based scheme can improve the system reliability by 100% recovering failure cases with less than 0.0062% computational overhead in best-case scenarios. In addition, MAVFI framework can be used for other ROS-based cyber-physical applications and is open-sourced at https: //github.com/harvard-edge/MAVBench/tree/mavfi.

[1]  Steven M. LaValle,et al.  RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[2]  Anne Goodchild,et al.  Delivery by drone: An evaluation of unmanned aerial vehicle technology in reducing CO 2 emissions in the delivery service industry , 2017, Transportation Research Part D: Transport and Environment.

[3]  Bu-Sung Lee,et al.  Autoencoder-based network anomaly detection , 2018, 2018 Wireless Telecommunications Symposium (WTS).

[4]  Agathoniki Trigoni,et al.  Supporting Search and Rescue Operations with UAVs , 2010, 2010 International Conference on Emerging Security Technologies.

[5]  Karthik Pattabiraman,et al.  LLFI: An Intermediate Code-Level Fault Injection Tool for Hardware Faults , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[6]  S. E. Michalak,et al.  Assessment of the Impact of Cosmic-Ray-Induced Neutrons on Hardware in the Roadrunner Supercomputer , 2012, IEEE Transactions on Device and Materials Reliability.

[7]  Pascal Fua,et al.  Multicamera People Tracking with a Probabilistic Occupancy Map , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Karthikeyan Sankaralingam,et al.  iGPU: Exception support and speculative execution on GPUs , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[9]  Eric Cheng,et al.  CLEAR: Cross-layer exploration for architecting resilience: Combining hardware and software techniques to tolerate soft errors in processor cores , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[10]  Johan Karlsson,et al.  One Bit is (Not) Enough: An Empirical Study of the Impact of Single and Multiple Bit-Flip Errors , 2017, 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[11]  Ravishankar K. Iyer,et al.  NFTAPE: a framework for assessing dependability in distributed systems with lightweight fault injectors , 2000, Proceedings IEEE International Computer Performance and Dependability Symposium. IPDS 2000.

[12]  Yu Wang,et al.  A Survey of FPGA-Based Robotic Computing , 2020, IEEE Circuits and Systems Magazine.

[13]  Carlos R. del-Blanco,et al.  DroNet: Learning to Fly by Driving , 2018, IEEE Robotics and Automation Letters.

[14]  Meeta Sharma Gupta,et al.  Understanding Soft Error Resiliency of Blue Gene/Q Compute Chip through Hardware Proton Irradiation and Software Fault Injection , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Ravishankar K. Iyer,et al.  AVFI: Fault Injection for Autonomous Vehicles , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W).

[16]  Aviral Shrivastava,et al.  nZDC: A compiler technique for near Zero Silent Data Corruption , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[17]  Minyi Guo,et al.  Characterizing Perception Module Performance and Robustness in Production-Scale Autonomous Driving System , 2019, NPC.

[18]  Pradip Bose,et al.  Voltage Noise in Multi-Core Processors: Empirical Characterization and Optimization Opportunities , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[19]  Johan Karlsson,et al.  GOOFI-2: A tool for experimental dependability assessment , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[20]  Md. Rafiqul Islam,et al.  A survey of anomaly detection techniques in financial domain , 2016, Future Gener. Comput. Syst..

[21]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[22]  Scott A. Mahlke,et al.  Harnessing Soft Computations for Low-Budget Fault Tolerance , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[23]  Sarita V. Adve,et al.  Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults , 2012, ASPLOS XVII.

[24]  Arijit Raychowdhury,et al.  iELAS: An ELAS-Based Energy-Efficient Accelerator for Real-Time Stereo Matching on FPGA Platform , 2021, 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS).

[25]  Ravishankar K. Iyer,et al.  ML-Based Fault Injection for Autonomous Vehicles: A Case for Bayesian Fault Injection , 2019, 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[26]  Yun Li,et al.  PID control system analysis, design, and technology , 2005, IEEE Transactions on Control Systems Technology.

[27]  Ruigang Yang,et al.  Adversarial Objects Against LiDAR-Based Autonomous Driving Systems , 2019, ArXiv.

[28]  David González,et al.  A Review of Motion Planning Techniques for Automated Vehicles , 2016, IEEE Transactions on Intelligent Transportation Systems.

[29]  Vijay Kumar,et al.  High speed navigation for quadrotors with limited onboard sensing , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Meeta Sharma Gupta,et al.  Configurable Detection of SDC-causing Errors in Programs , 2017, ACM Trans. Embed. Comput. Syst..

[31]  Homa Alemzadeh,et al.  Experimental Resilience Assessment of an Open-Source Driving Agent , 2018, 2018 IEEE 23rd Pacific Rim International Symposium on Dependable Computing (PRDC).

[32]  David D. Ward,et al.  ISO 26262 safety cases: Compliance and assurance , 2011 .

[33]  Pavan Balaji,et al.  VOCL-FT: introducing techniques for efficient soft error coprocessor recovery , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[34]  Arijit Raychowdhury,et al.  Autonomous Navigation via Deep Reinforcement Learning for Resource Constraint Edge Nodes Using Transfer Learning , 2019, IEEE Access.

[35]  Meng Zhang,et al.  MedMon: Securing Medical Devices Through Wireless Monitoring and Anomaly Detection , 2013, IEEE Transactions on Biomedical Circuits and Systems.

[36]  Amin Ansari,et al.  Encore: Low-cost, fine-grained transient fault recovery , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[37]  Wei-Jen Lee,et al.  A Dual Modular Redundancy Scheme for CPU–FPGA Platform-Based Systems , 2018, IEEE Transactions on Industry Applications.

[38]  Mohammed Chadli,et al.  Actuator fault detection and isolation on a quadrotor unmanned aerial vehicle modeled as a linear parameter-varying system , 2019, Measurement and Control.

[39]  Craig B. Zilles,et al.  A characterization of instruction-level error derating and its implications for error detection , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[40]  Srinivas Koppu,et al.  Fault Control Using Triple Modular Redundancy (TMR) , 2018 .

[41]  Saibal Mukhopadhyay,et al.  WarningNet: A Deep Learning Platform for Early Warning of Task Failures under Input Perturbation for Reliable Autonomous Platforms , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[42]  David R. Kaeli,et al.  Eliminating microarchitectural dependency from Architectural Vulnerability , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[43]  Aleksandra Faust,et al.  Air Learning: An AI Research Platform for Algorithm-Hardware Benchmarking of Autonomous Aerial Robots , 2019, ArXiv.

[44]  Bo Fang,et al.  ePVF: An Enhanced Program Vulnerability Factor Methodology for Cross-Layer Resilience Analysis , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[45]  Vasileios Porpodas,et al.  ZOFI: Zero-Overhead Fault Injection Tool for Fast Transient Fault Coverage Analysis , 2019, ArXiv.

[46]  Ravishankar K. Iyer,et al.  Hands Off the Wheel in Autonomous Vehicles?: A Systems Perspective on over a Million Miles of Field Data , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[47]  Xin Qi,et al.  Self-Healing Control Framework Against Actuator Fault of Single-Rotor Unmanned Helicopters , 2016 .

[48]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[49]  Amin Ansari,et al.  Shoestring: probabilistic soft error reliability on the cheap , 2010, ASPLOS XV.

[50]  Asim Kadav,et al.  Fine-grained fault tolerance using device checkpoints , 2013, ASPLOS '13.

[51]  Karthik Pattabiraman,et al.  Quantifying the Accuracy of High-Level Fault Injection Techniques for Hardware Faults , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[52]  Stephen W. Keckler,et al.  SASSIFI: An architecture-level fault injection tool for GPU application resilience evaluation , 2017, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[53]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[54]  Michael Bosse,et al.  Get Out of My Lab: Large-scale, Real-Time Visual-Inertial Localization , 2015, Robotics: Science and Systems.

[55]  Gu-Yeon Wei,et al.  The Sky Is Not the Limit: A Visual Performance Model for Cyber-Physical Co-Design in Autonomous Machines , 2020, IEEE Computer Architecture Letters.

[56]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[57]  Aviral Shrivastava,et al.  EXPERT: Effective and flexible error protection by redundant multithreading , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[58]  Maximilian Lam,et al.  Quantized Reinforcement Learning (QUARL) , 2019, ArXiv.

[59]  Emre Ozer,et al.  Addressing Functional Safety Challenges in Autonomous Vehicles with the Arm TCL S Architecture , 2018, IEEE Design & Test.

[60]  A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor , 2003, MICRO.

[61]  Karthik Pattabiraman,et al.  Modeling Soft-Error Propagation in Programs , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[62]  Saurabh Jha,et al.  AV-FUZZER: Finding Safety Violations in Autonomous Driving Systems , 2020, 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE).

[63]  Michail Maniatakos,et al.  Instruction-Level Impact Analysis of Low-Level Faults in a Modern Microprocessor Controller , 2011, IEEE Transactions on Computers.

[64]  Philip Koopman,et al.  Challenges in Autonomous Vehicle Testing and Validation , 2016 .

[65]  Minyi Guo,et al.  Asymmetric Resilience: Exploiting Task-Level Idempotency for Transient Error Recovery in Accelerator-Based Systems , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[66]  Jacob A. Abraham,et al.  Quantitative evaluation of soft error injection techniques for robust system design , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[67]  Joel S. Emer,et al.  The soft error problem: an architectural perspective , 2005, 11th International Symposium on High-Performance Computer Architecture.

[68]  Heng Yin,et al.  Chaser: An Enhanced Fault Injection Tool for Tracing Soft Errors in MPI Applications , 2020, 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[69]  Peter Jesty,et al.  Safety Cases and Their Role in ISO 26262 Functional Safety Assessment , 2013, SAFECOMP.

[70]  Ismail Akturk,et al.  ACR: Amnesic Checkpointing and Recovery , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[71]  Devesh Tiwari,et al.  Clover: Compiler Directed Lightweight Soft Error Resilience , 2015, LCTES.

[72]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[73]  Wenzhi Cui,et al.  MAVBench: Micro Aerial Vehicle Benchmarking , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[74]  Gu-Yeon Wei,et al.  Machine Learning-Based Automated Design Space Exploration for Autonomous Aerial Robots , 2021, ArXiv.

[75]  Ravishankar K. Iyer,et al.  Kayotee: A Fault Injection-based System to Assess the Safety and Reliability of Autonomous Vehicles to Faults and Errors , 2019, ArXiv.

[76]  Tianjia He,et al.  Exploring Inherent Sensor Redundancy for Automotive Anomaly Detection , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[77]  Asim Kadav,et al.  Tolerating hardware device failures in software , 2009, SOSP '09.

[78]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[79]  Frank Mueller,et al.  Snapify: capturing snapshots of offload applications on xeon phi manycore processors , 2014, HPDC '14.