Crashing Simulated Planes is Cheap: Can Simulation Detect Robotics Bugs Early?

Robotics and autonomy systems are becoming increasingly important, moving from specialised factory domains to increasingly general and consumer-focused applications. As such systems grow ubiquitous, there is a commensurate need to protect against potentially catastrophic harm. System-level testing in simulation is a particularly promising approach for assuring robotics systems, allowing for more extensive testing in realistic scenarios and seeking bugs that may not manifest at the unit-level. Ideally, such testing could find critical bugs well before expensive field-testing is required. However, simulations can only model coarse environmental abstractions, contributing to a common perception that robotics bugs can only be found in live deployment. To address this gap, we conduct an empirical study on bugs that have been fixed in the widely used, opensource ARDUPILOT system. We identify bug-fixing commits by exploiting commenting conventions in the version-control history. We provide a quantitative and qualitative evaluation of the bugs, focusing on characterising how the bugs are triggered and how they can be detected, with a goal of identifying how they can be best identified in simulation, well before field testing. To our surprise, we find that the majority of bugs manifest under simple conditions that can be easily reproduced in software-based simulation. Conversely, we find that system configurations and forms of input play an important role in triggering bugs. We use these results to inform a novel framework for testing for these and other bugs in simulation, consistently and reproducibly. These contributions can inform the construction of techniques for automated testing of robotics systems, with the goal of finding bugs early and cheaply, without incurring the costs of physically testing for bugs in live systems.

[1]  Claes Wohlin,et al.  Assuring fault classification agreement - an empirical evaluation , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[2]  Sven Apel,et al.  Toward variability-aware testing , 2012, FOSD '12.

[3]  Daniel P. Siewiorek,et al.  Automated robustness testing of off-the-shelf software components , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[4]  Sebastian Wrede,et al.  A Data Set for Fault Detection Research on Component-Based Robotic Systems , 2016, TAROS.

[5]  Abhik Roychoudhury,et al.  Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[6]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[7]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[8]  Susan Stepney,et al.  An Investigation into the Use of Mutation Analysis for Automated Program Repair , 2017, SSBSE.

[9]  Abhik Roychoudhury,et al.  Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[10]  Karthik Pattabiraman,et al.  ARTINALI: dynamic invariant detection for cyber-physical system security , 2017, ESEC/SIGSOFT FSE.

[11]  Fred Kröger,et al.  Temporal Logic of Programs , 1987, EATCS Monographs on Theoretical Computer Science.

[12]  Wang Zhi-jian Using Benchmarking to Advance Research:A Challenge to Software Engineering , 2005 .

[13]  Murray Shanahan,et al.  The Event Calculus Explained , 1999, Artificial Intelligence Today.

[14]  Dov M. Gabbay,et al.  Reactive automata , 2011, Inf. Comput..

[15]  Gordon Fraser,et al.  Evolutionary Generation of Whole Test Suites , 2011, 2011 11th International Conference on Quality Software.

[16]  Vikram S. Adve,et al.  An empirical study of reported bugs in server software with implications for automated bug diagnosis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[17]  Premkumar T. Devanbu,et al.  Fair and balanced?: bias in bug-fix datasets , 2009, ESEC/FSE '09.

[18]  Yuriy Brun,et al.  The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs , 2015, IEEE Transactions on Software Engineering.

[19]  Emilio Tuosto,et al.  Debugging Distributed Systems with Causal Nets , 2008, Electron. Commun. Eur. Assoc. Softw. Sci. Technol..

[20]  Kishor S. Trivedi,et al.  Fault triggers in open-source software: An experience report , 2013, 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE).

[21]  Jérémie Guiochet,et al.  Can Robot Navigation Bugs Be Found in Simulation? An Exploratory Study , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[22]  Jos C. M. Baeten,et al.  A brief history of process algebra , 2005, Theor. Comput. Sci..

[23]  Claire Le Goues,et al.  GenProg: A Generic Method for Automatic Software Repair , 2012, IEEE Transactions on Software Engineering.

[24]  Shin Yoo,et al.  Ask the Mutants: Mutating Faulty Programs for Fault Localization , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[25]  Zheng Liu,et al.  Automated testing for large-scale critical software systems , 2014, 2014 IEEE 5th International Conference on Software Engineering and Service Science.

[26]  Yannis Smaragdakis,et al.  JCrasher: an automatic robustness tester for Java , 2004, Softw. Pract. Exp..

[27]  Yannis Smaragdakis,et al.  Check 'n' crash: combining static checking and testing , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..