Proposal : Identification of Software Failures in Complex Systems Using Low-Level Execution Data

Autonomous and robotics systems (ARSs) – systems that are designed to react independently and without human supervision to environmental stimuli – are complex and difficult to supervise but are an increasingly large portion of the systems currently being developed and in use. Quality assurance for these systems is complex, and the software for these systems contains many faults. My key insight is that typical program behavior is a basis for determining whether a program is operating within its normal parameters. To leverage this, I record summaries of program execution behavior using low-level monitoring to characterize each execution. By aggregating low-level execution data over many executions, I create a picture of typical program behavior; different behavior may indicate unintended behavior. My techniques use the data as input to machine learning algorithms which build models of expected behavior. These models analyze individual program executions to predict whether the given execution represents typical behavior. My core thesis is: Low-level execution signals, recorded over multiple executions of a robotics program or portion thereof, can be used to create machine learning models that, in turn, can be used to predict whether signals from previously-unseen executions represent usual or unusual behavior. The combination of low-level instrumentation and models can provide predictions with reasonable trade-offs between prediction accuracy, instrumentation intrusiveness, and calculation efficiency. To support this thesis I demonstrate the efficacy of these techniques to detect software failures on small programs and in simulation on the ARDUPILOT autonomous vehicle and on other ARSs based on the Robot Operating System (ROS). I observe that ARSs are well-suited to low-level monitoring because they are cyber-physical. Although in other situations such monitoring may create intolerable overhead, these distributed systems that interact with the real world have time or cycles that would otherwise be spent waiting for real world events. As such, ARSs are well-situated to absorb overhead that monitoring generates. However, ARSs often do have timing-sensitive components, for example, deadlines and timeouts that, if missed, cause the system to abort. To this end, I measure the extent to which ARSs can absorb artificially-inserted timing delays.

[1]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[2]  David A. Wagner,et al.  Mimicry attacks on host-based intrusion detection systems , 2002, CCS '02.

[3]  Shing-Chi Cheung,et al.  PAT: A pattern classification approach to automatic reference oracles for the testing of mesh simplification programs , 2009, J. Syst. Softw..

[4]  Gernot Heiser,et al.  Comprehensive formal verification of an OS microkernel , 2014, TOCS.

[5]  Selma Saidi,et al.  Prediction of abnormal temporal behavior in real-time systems , 2018, SAC.

[6]  James M. Rehg,et al.  Active learning for automatic classification of software behavior , 2004, ISSTA '04.

[7]  Mark Harman,et al.  The Oracle Problem in Software Testing: A Survey , 2015, IEEE Transactions on Software Engineering.

[8]  Andreas Theissler,et al.  Detecting known and unknown faults in automotive systems using ensemble-based anomaly detection , 2017, Knowl. Based Syst..

[9]  Claire Le Goues,et al.  Automatically finding patches using genetic programming , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[10]  Gordon Fraser,et al.  Generating effective test cases for self-driving cars from police reports , 2019, ESEC/SIGSOFT FSE.

[11]  Leo Grady,et al.  Automating image segmentation verification and validation by learning test oracles , 2011, Inf. Softw. Technol..

[12]  Sebastian G. Elbaum,et al.  Inferring and monitoring invariants in robotic systems , 2017, Auton. Robots.

[13]  Martin C. Rinard,et al.  Detecting and Escaping Infinite Loops with Jolt , 2011, ECOOP.

[14]  David A. Clifton,et al.  A review of novelty detection , 2014, Signal Process..

[15]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI.

[16]  Philip Koopman,et al.  Interface Robustness Testing: Experience and Lessons Learned from the Ballista Project , 2008 .

[17]  Yuriy Brun,et al.  Finding latent code errors via machine learning over program executions , 2004, Proceedings. 26th International Conference on Software Engineering.

[18]  Philip Koopman,et al.  Robustness Inside Out Testing , 2020, 2020 50th Annual IEEE-IFIP International Conference on Dependable Systems and Networks-Supplemental Volume (DSN-S).

[19]  Ravishankar K. Iyer,et al.  Hands Off the Wheel in Autonomous Vehicles?: A Systems Perspective on over a Million Miles of Field Data , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[20]  Vikram S. Adve,et al.  An empirical study of reported bugs in server software with implications for automated bug diagnosis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[21]  Alessandro Orso,et al.  Applying classification techniques to remotely-collected program execution data , 2005, ESEC/FSE-13.

[22]  Philip Koopman,et al.  Robustness Testing of Autonomy Software , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[23]  Yuriy Brun,et al.  Automatic mining of specifications from invocation traces and method invariants , 2014, SIGSOFT FSE.

[24]  Philip Koopman,et al.  Autonomous Vehicle Safety: An Interdisciplinary Challenge , 2017, IEEE Intelligent Transportation Systems Magazine.

[25]  Koushik Sen,et al.  Looper: Lightweight Detection of Infinite Loops at Runtime , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[26]  Khaled El Emam,et al.  The repeatability of code defect classifications , 1998, Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257).

[27]  Roman L. Lysecky,et al.  Probabilistic Threat Detection for Risk Management in Cyber-physical Medical Systems , 2017, IEEE Software.

[28]  Robyn R. Lutz,et al.  Analyzing software requirements errors in safety-critical, embedded systems , 1993, [1993] Proceedings of the IEEE International Symposium on Requirements Engineering.

[29]  Yuanyuan Zhou,et al.  Have things changed now?: an empirical study of bug characteristics in modern open source software , 2006, ASID '06.

[30]  Stephanie Forrest,et al.  The Challenges of Sensing and Repairing Software Defects in Autonomous Systems , 2014 .

[31]  Maurizio Filippone,et al.  A comparative evaluation of outlier detection algorithms: Experiments and analyses , 2018, Pattern Recognit..

[32]  David R. O'Hallaron,et al.  Computer Systems: A Programmer's Perspective , 1991 .

[33]  Matthew J. Rutherford,et al.  An Empirical Evaluation of Assertions as Oracles , 2011, 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation.

[34]  Nanning Zheng,et al.  A Novel Integrated Simulation and Testing Platform for Self-Driving Cars With Hardware in the Loop , 2019, IEEE Transactions on Intelligent Vehicles.

[35]  Christine Julien,et al.  Perceptions on the State of the Art in Verification and Validation in Cyber-Physical Systems , 2017, IEEE Systems Journal.

[36]  Jonas Nilsson,et al.  Driving tests for self-driving cars , 2018, IEEE Spectrum.

[37]  Seungjun Kim,et al.  Toward Immersive Self-Driving Simulations: Reports from a User Study across Six Platforms , 2020, CHI.

[38]  Marcelo d'Amorim,et al.  Entropy-based test generation for improved fault localization , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[39]  Michael I. Jordan,et al.  Scalable statistical bug isolation , 2005, PLDI '05.

[40]  Andrew David Eisenberg,et al.  Dynamic feature traces: finding features in unfamiliar code , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[41]  Tong Duy Son,et al.  Simulation-Based Testing Framework for Autonomous Driving Development , 2019, 2019 IEEE International Conference on Mechatronics (ICM).

[42]  Roman L. Lysecky,et al.  Analysis of Control Flow Events for Timing-based Runtime Anomaly Detection , 2015, WESS.

[43]  Nidhi Kalra,et al.  Measuring Automated Vehicle Safety: Forging a Framework , 2018 .

[44]  Michael D. Ernst,et al.  Automatically patching errors in deployed software , 2009, SOSP '09.

[45]  Kishor S. Trivedi,et al.  An empirical investigation of fault types in space mission system software , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[46]  Michael D. Ernst,et al.  Feedback-Directed Random Test Generation , 2007, 29th International Conference on Software Engineering (ICSE'07).

[47]  Daniel Sundmark,et al.  Towards Classification of Concurrency Bugs Based on Observable Properties , 2015, 2015 IEEE/ACM 1st International Workshop on Complex Faults and Failures in Large Software Systems (COUFLESS).

[48]  Insup Lee,et al.  A Study on Run Time Assurance for Complex Cyber Physical Systems , 2013 .

[49]  Dawson R. Engler,et al.  Bugs as deviant behavior: a general approach to inferring errors in systems code , 2001, SOSP.

[50]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[51]  Westley Weimer,et al.  Automated program repair through the evolution of assembly code , 2010, ASE.

[52]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[53]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[54]  Michael I. Jordan,et al.  Statistical debugging: simultaneous identification of multiple bugs , 2006, ICML.

[55]  Sudheendra Hangal,et al.  Tracking down software bugs using automatic anomaly detection , 2002, ICSE '02.

[56]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[57]  Afsoon Afzal,et al.  Crashing Simulated Planes is Cheap: Can Simulation Detect Robotics Bugs Early? , 2018, 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST).

[58]  David Lo,et al.  Deep specification mining , 2018, ISSTA.

[59]  Osman Hasan,et al.  Formal Verification of Cyber-Physical Systems: Coping with Continuous Elements , 2013, ICCSA.

[60]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[61]  Chang-Tien Lu,et al.  Outlier Detection , 2008, Encyclopedia of GIS.

[62]  David Leon,et al.  Finding failures by cluster analysis of execution profiles , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[63]  Mark Harman,et al.  An Analysis and Survey of the Development of Mutation Testing , 2011, IEEE Transactions on Software Engineering.

[64]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[65]  Philip Koopman,et al.  Putting Image Manipulations in Context: Robustness Testing for Safe Perception , 2018, 2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).

[66]  Sridhar Narayanan,et al.  IODINE: a tool to automatically infer dynamic invariants for hardware designs , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[67]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[68]  Westley Weimer,et al.  Automated repair of binary and assembly programs for cooperating embedded devices , 2013, ASPLOS '13.

[69]  Chengying Mao,et al.  Extracting the Representative Failure Executions via Clustering Analysis Based on Markov Profile Model , 2005, ADMA.

[70]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[71]  Robert K. Cunningham,et al.  The Real Cost of Software Errors , 2009, IEEE Security & Privacy.

[72]  Michael Norrish,et al.  seL4: formal verification of an operating-system kernel , 2010, Commun. ACM.

[73]  Nicholas Nethercote,et al.  Dynamic Binary Analysis and Instrumentation , 2004 .

[74]  Afsoon Afzal,et al.  A Study on the Challenges of Using Robotics Simulators for Testing , 2020, ArXiv.

[75]  F. Cassez,et al.  Efficient and Scalable Runtime Monitoring for CyberPhysical System , 2018 .

[76]  Georgios E. Fainekos,et al.  Utilizing S-TaLiRo as an automatic test generation framework for autonomous vehicles , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[77]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[78]  Cheng Zhang,et al.  Automated Test Oracles: A Survey , 2015, Adv. Comput..

[79]  Nancy G. Leveson,et al.  Engineering a Safer World: Systems Thinking Applied to Safety , 2012 .

[80]  Gordon Fraser,et al.  Automatically testing self-driving cars with search-based procedural content generation , 2019, ISSTA.

[81]  Claes Wohlin,et al.  Assuring fault classification agreement - an empirical evaluation , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[82]  Thomas J. Ostrand,et al.  Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria , 1994, Proceedings of 16th International Conference on Software Engineering.

[83]  Claire Le Goues,et al.  Detecting Execution Anomalies As an Oracle for Autonomy Software Robustness , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[84]  Zhenkai Liang,et al.  Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation , 2007, USENIX Security Symposium.

[85]  Jiawei Han,et al.  Classification of software behaviors for failure detection: a discriminative pattern mining approach , 2009, KDD.

[86]  Andreas Zeller,et al.  Mutation-Driven Generation of Unit Tests and Oracles , 2010, IEEE Transactions on Software Engineering.

[87]  James M. Bieman,et al.  Techniques for testing scientific programs without an oracle , 2013, 2013 5th International Workshop on Software Engineering for Computational Science and Engineering (SE-CSE).

[88]  David Coppit,et al.  On the Use of Specification-Based Assertions as Test Oracles , 2005, 29th Annual IEEE/NASA Software Engineering Workshop.

[89]  Mark Zwolinski,et al.  Using Hardware Performance Counters to Detect Control Hijacking Attacks , 2019, 2019 IEEE 4th International Verification and Security Workshop (IVSW).

[90]  Tao Xie,et al.  Augmenting Automatically Generated Unit-Test Suites with Regression Oracle Checking , 2006, ECOOP.

[91]  Stephen McCamant,et al.  The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..

[92]  Philip Koopman,et al.  Robust software - no more excuses , 2002, Proceedings International Conference on Dependable Systems and Networks.

[93]  Ravishankar K. Iyer,et al.  Characterization of linux kernel behavior under errors , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[94]  Yuanyuan Zhou,et al.  Learning from mistakes: a comprehensive study on real world concurrency bug characteristics , 2008, ASPLOS.

[95]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[96]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[97]  Antonia Bertolino,et al.  Software Testing Research and Practice , 2003, Abstract State Machines.

[98]  Bin Wang,et al.  Automated support for classifying software failure reports , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[99]  Philip Koopman,et al.  Toward a Framework for Highly Automated Vehicle Safety Validation , 2018 .

[100]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[101]  M. Amer,et al.  Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner , 2012 .

[102]  Christine Julien,et al.  Verification and Validation in Cyber Physical Systems: Research Challenges and a Way Forward , 2015, 2015 IEEE/ACM 1st International Workshop on Software Engineering for Smart Cyber-Physical Systems.

[103]  Gregory Gay,et al.  Automated oracle creation support, or: How I learned to stop worrying about fault propagation and love mutation testing , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[104]  Claire Le Goues,et al.  A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[105]  Michael D. Ernst,et al.  Eclat: Automatic Generation and Classification of Test Inputs , 2005, ECOOP.

[106]  David Leon,et al.  Pursuing failure: the distribution of program failures in a profile space , 2001, ESEC/FSE-9.

[107]  Kishor S. Trivedi,et al.  Fault triggers in open-source software: An experience report , 2013, 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE).

[108]  Michael I. Jordan,et al.  Statistical Debugging of Sampled Programs , 2003, NIPS.

[109]  Gordon Fraser,et al.  Whole Test Suite Generation , 2013, IEEE Transactions on Software Engineering.

[110]  Alessandro Orso,et al.  Techniques for Classifying Executions of Deployed Software to Support Software Engineering Tasks , 2007, IEEE Transactions on Software Engineering.

[111]  Jérémie Guiochet,et al.  Can Robot Navigation Bugs Be Found in Simulation? An Exploratory Study , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[112]  Gerald Steinbauer,et al.  A Survey about Faults of Robots Used in RoboCup , 2012, RoboCup.

[113]  Gregory Dudek,et al.  Generating Adversarial Driving Scenarios in High-Fidelity Simulators , 2019, 2019 International Conference on Robotics and Automation (ICRA).