Challenges of Reliability Assessment and Enhancement in Autonomous Systems

The gigantic complexity and heterogeneity of today's advanced cyber-physical systems and systems of systems is multiplied by the use of avant-garde computing architectures to employ artificial intelligence based autonomy in the system. Here, the overall system's reliability comes along with requirements for fail-safe, fail-operational modes specific to the target applications of the autonomous system and adopted HW architectures. The paper makes an overview of reliability challenges for intelligence implementation in autonomous systems enabled by HW backbones such as neuromorphic architectures, approximate computing architectures, GPUs, tensor processing units (TPUs) and SoC FPGAs.

[1]  Jaan Raik,et al.  BASTION: Board and SoC test instrumentation for ageing and no failure found , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[2]  Riccardo Mariani,et al.  An overview of autonomous vehicles safety , 2018, 2018 IEEE International Reliability Physics Symposium (IRPS).

[3]  Matteo Sonza Reorda,et al.  An extended model to support detailed GPGPU reliability analysis , 2019, 2019 14th International Conference on Design & Technology of Integrated Systems In Nanoscale Era (DTIS).

[4]  Martin Törngren,et al.  How to Deal with the Complexity of Future Cyber-Physical Systems? , 2018, Designs.

[5]  Matteo Sonza Reorda,et al.  On the in-field test of the GPGPU scheduler memory , 2019, 2019 IEEE 22nd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS).

[6]  Lukas Martin,et al.  Towards self-reconfiguration of space systems on architectural level based on qualitative ratings , 2014, 2014 IEEE Aerospace Conference.

[7]  Nidhi Kalra,et al.  Driving to Safety , 2016 .

[8]  Eric Cheng,et al.  Tolerating Soft Errors in Processor Cores Using CLEAR (Cross-Layer Exploration for Architecting Resilience) , 2017, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[9]  Mark van den Brand,et al.  On functional safety methods: A system of systems approach , 2018, 2018 Annual IEEE International Systems Conference (SysCon).

[10]  Gustavo Ribeiro Alves,et al.  A self-healing real-time system based on run-time self-reconfiguration , 2005, 2005 IEEE Conference on Emerging Technologies and Factory Automation.

[11]  V. Huard,et al.  Architecture- and workload-dependent digital failure rate , 2017, 2017 IEEE International Reliability Physics Symposium (IRPS).

[12]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[13]  Nicola Bezzo,et al.  Security and Resiliency of Coordinated Autonomous Vehicles , 2019, 2019 Systems and Information Engineering Design Symposium (SIEDS).

[14]  Emre Ozer,et al.  Addressing Functional Safety Challenges in Autonomous Vehicles with the Arm TCL S Architecture , 2018, IEEE Design & Test.

[15]  Paolo Rech,et al.  Reliability Evaluation of Mixed-Precision Architectures , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[16]  Kwang-Ting Cheng,et al.  Taming Emerging Devices' Variation and Reliability Challenges with Architectural and System Solutions [Invited] , 2019, 2019 IEEE 32nd International Conference on Microelectronic Test Structures (ICMTS).

[17]  Bo Fang,et al.  GPU-Qin: A methodology for evaluating the error resilience of GPGPU applications , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[18]  Jaan Raik,et al.  Designing Reliable Cyber-Physical Systems , 2018 .

[19]  R. Bell,et al.  IEC 61508: functional safety of electrical/electronic/ programme electronic safety-related systems: overview , 1999 .

[20]  Xin Fu,et al.  Analyzing soft-error vulnerability on GPGPU microarchitecture , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[21]  Dan Alexandrescu,et al.  Understanding multidimensional verification: Where functional meets non-functional , 2019, Microprocess. Microsystems.

[22]  Paolo Bernardi,et al.  Non-Intrusive Self-Test Library for Automotive Critical Applications: Constraints and Solutions , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[23]  Sujit Dey,et al.  Software-based self-testing methodology for processor cores , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[24]  Mehdi Baradaran Tahoori,et al.  Dependable Multicore Architectures at Nanoscale: The View From Europe , 2015, IEEE Design & Test.

[25]  Luigi Carro,et al.  Kernel and layer vulnerability factor to evaluate object detection reliability in GPUs , 2018, IET Comput. Digit. Tech..

[26]  Frederico Pratas,et al.  Measuring the effectiveness of ISO26262 compliant self test library , 2018, 2018 19th International Symposium on Quality Electronic Design (ISQED).

[27]  Raimund Ubar,et al.  High-Level Combined Deterministic and Pseudo-exhuastive Test Generation for RISC Processors , 2019, 2019 IEEE European Test Symposium (ETS).

[28]  Philip Koopman,et al.  How Many Operational Design Domains, Objects, and Events? , 2019, SafeAI@AAAI.

[29]  Jacob A. Abraham,et al.  Efficient soft error vulnerability estimation of complex designs , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[30]  Luigi Carro,et al.  Analyzing and Increasing the Reliability of Convolutional Neural Networks on GPUs , 2019, IEEE Transactions on Reliability.

[31]  Mohsen Guizani,et al.  Unmanned Aerial Vehicles (UAVs): A Survey on Civil Applications and Key Research Challenges , 2018, IEEE Access.