Intermittently failing tests in the embedded systems domain

Software testing is sometimes plagued with intermittently failing tests and finding the root causes of such failing tests is often difficult. This problem has been widely studied at the unit testing level for open source software, but there has been far less investigation at the system test level, particularly the testing of industrial embedded systems. This paper describes our investigation of the root causes of intermittently failing tests in the embedded systems domain, with the goal of better understanding, explaining and categorizing the underlying faults. The subject of our investigation is a currently-running industrial embedded system, along with the system level testing that was performed. We devised and used a novel metric for classifying test cases as intermittent. From more than a half million test verdicts, we identified intermittently and consistently failing tests, and identified their root causes using multiple sources. We found that about 1-3% of all test cases were intermittently failing. From analysis of the case study results and related work, we identified nine factors associated with test case intermittence. We found that a fix for a consistently failing test typically removed a larger number of failures detected by other tests than a fix for an intermittent test. We also found that more effort was usually needed to identify fixes for intermittent tests than for consistent tests. An overlap between root causes leading to intermittent and consistent tests was identified. Many root causes of intermittence are the same in industrial embedded systems and open source software. However, when comparing unit testing to system level testing, especially for embedded systems, we observed that the test environment itself is often the cause of intermittence.

[1]  Sigrid Eldh,et al.  Component Testing Is Not Enough - A Study of Software Faults in Telecom Middleware , 2007, TestCom/FATES.

[2]  Melvin A. Breuer,et al.  Testing for Intermittent Faults in Digital Circuits , 1973, IEEE Transactions on Computers.

[3]  Daniel Sundmark,et al.  Concurrency bugs in open source software: a case study , 2017, Journal of Internet Services and Applications.

[4]  Vahid Garousi,et al.  Smells in software test code: A survey of knowledge in industry and academia , 2018, J. Syst. Softw..

[5]  Arie van Deursen,et al.  Refactoring test code , 2001 .

[6]  Nicolas Privault,et al.  Understanding Markov Chains , 2013 .

[7]  Fabio Palomba,et al.  Understanding flaky tests: the developer’s perspective , 2019, ESEC/SIGSOFT FSE.

[8]  Hans A. Hansson,et al.  A Survey on Testing for Cyber Physical System , 2015, ICTSS.

[9]  Michael Pecht,et al.  Intermittent Failures in Hardware and Software , 2014 .

[10]  Xiaochen Li,et al.  What Causes My Test Alarm? Automatic Cause Analysis for Test Alarms in System and Integration Testing , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[11]  Nancy G. Leveson,et al.  Role of Software in Spacecraft Accidents , 2004 .

[12]  Abhik Roychoudhury,et al.  On Testing Embedded Software , 2016, Adv. Comput..

[13]  Elaine J. Weyuker,et al.  Experience Report: Automated System Level Regression Test Prioritization Using Multiple Factors , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[14]  Suman Nath,et al.  Root causing flaky tests in a large-scale industrial setting , 2019, ISSTA.

[15]  SU STEPHENY.H. A survey of methods for intermittent fault analysis * , 2010 .

[16]  Azeem Ahmad,et al.  Empirical analysis of practitioners' perceptions of test flakiness factors , 2019, Softw. Test. Verification Reliab..

[17]  Darko Marinov,et al.  An empirical analysis of flaky tests , 2014, SIGSOFT FSE.

[18]  Darko Marinov,et al.  Reliable testing: detecting state-polluting tests to prevent test dependency , 2015, ISSTA.

[19]  Mauricio A. Saca Refactoring improving the design of existing code , 2017, 2017 IEEE 37th Central America and Panama Convention (CONCAPAN XXXVII).

[20]  Thomas Ball,et al.  Finding and Reproducing Heisenbugs in Concurrent Programs , 2008, OSDI.

[21]  Austen Rainer,et al.  Case Study Research in Software Engineering - Guidelines and Examples , 2012 .

[22]  Vahid Garousi,et al.  What We Know about Testing Embedded Software , 2018, IEEE Software.

[23]  Elaine J. Weyuker,et al.  Collecting and categorizing software error data in an industrial environment , 2018, J. Syst. Softw..

[24]  Jan Bosch,et al.  Continuous Integration Applied to Software-Intensive Embedded Systems - Problems and Experiences , 2016, PROFES.

[25]  Nicolas Privault,et al.  Understanding Markov Chains: Examples and Applications , 2013 .

[26]  M. Ball,et al.  Effects and detection of intermittent failures in digital systems , 1969, AFIPS '69 (Fall).

[27]  Darko Marinov,et al.  DeFlaker: Automatically Detecting Flaky Tests , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[28]  Daniel Sundmark,et al.  Impediments for software test automation: A systematic literature review , 2017, Softw. Test. Verification Reliab..

[29]  Kishor S. Trivedi,et al.  A Classification of Software Faults , 2011 .

[30]  Per Erik Strandberg,et al.  Decision making and visualizations based on test results , 2018, ESEM.

[31]  W. Fordham Cooper Electrical control of dangerous machinery and processes , 1947 .

[32]  Liming Zhu,et al.  Continuous Integration, Delivery and Deployment: A Systematic Review on Approaches, Tools, Challenges and Practices , 2017, IEEE Access.

[33]  Javier Gonzalez-Huerta,et al.  Towards a Mapping of Software Technical Debt onto Testware , 2017, 2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA).

[34]  Reid Holmes,et al.  Measuring the cost of regression testing in practice: a study of Java projects using continuous integration , 2017, ESEC/SIGSOFT FSE.

[35]  Michael D. Ernst,et al.  Empirically revisiting the test independence assumption , 2014, ISSTA 2014.

[36]  Elaine J. Weyuker,et al.  Automated test mapping and coverage for network topologies , 2018, ISSTA.

[37]  Nachiappan Nagappan,et al.  Empirically Detecting False Test Alarms Using Association Rules , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[38]  Nikolaos Sycofyllos An Empirical Exploration in the Study of Software-Related Fatal Failures , 2016 .

[39]  Elaine J. Weyuker,et al.  The Automatic Generation of Load Test Suites and the Assessment of the Resulting Software , 1995, IEEE Trans. Software Eng..

[40]  Kishor S. Trivedi,et al.  Reproducibility of Environment-Dependent Software Failures: An Experience Report , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[41]  Per Erik Strandberg,et al.  Information Flow in Software Testing – An Interview Study With Embedded Software Engineering Practitioners , 2019, IEEE Access.

[42]  Zebao Gao,et al.  Quantifying flakiness and Minimizing its effects on Software Testing , 2017 .

[43]  Elaine J. Weyuker,et al.  Automated System-Level Regression Test Prioritization in a Nutshell , 2017, IEEE Software.

[44]  Amin Milani Fard,et al.  An empirical study of bugs in test code , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[45]  Wayne Wolf,et al.  Hardware-software co-design of embedded systems , 1994, Proc. IEEE.

[46]  Gregg Rothermel,et al.  Techniques for improving regression testing in continuous integration development environments , 2014, SIGSOFT FSE.

[47]  Ravishankar K. Iyer,et al.  Lessons Learned from the Analysis of System Failures at Petascale: The Case of Blue Waters , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[48]  Na Meng,et al.  An Empirical Study of Flaky Tests in Android Apps , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).