Assessing the quality of industrial avionics software: an extensive empirical evaluation

A real-time operating system for avionics (RTOS4A) provides an operating environment for avionics application software. Since an RTOS4A has safety-critical applications, demonstrating a satisfactory level of its quality to its stakeholders is very important. By assessing the variation in quality across consecutive releases of an industrial RTOS4A based on test data collected over 17 months, we aim to provide a set of guidelines to 1) improve the test effectiveness and thus the quality of subsequent RTOS4A releases and 2) similarly assess the quality of other systems from test data. We carefully defined a set of research questions, for which we defined a number of variables (based on available test data), including release and measures of test effort, test effectiveness, complexity, test efficiency, test strength, and failure density. With these variables, to assess the quality in terms of number of failures found in tests, we applied a combination of analyses, including trend analysis using two-dimensional graphs, correlation analysis using Spearman’s test, and difference analysis using the Wilcoxon rank test. Key results include the following: 1) The number of failures and failure density decreased in the latest releases and the test coverage was either high or did not decrease with each release; 2) increased test effort was spent on modules of greater complexity and the number of failures was not high in these modules; and 3) the test coverage for modules without failures was not lower than the test coverage for modules with failures uncovered in all the releases. The overall assessment, based on the evidences, suggests that the quality of the latest RTOS4A release has improved. We conclude that the quality of the RTOS4A studied was improved in the latest release. In addition, our industrial partner found our guidelines useful and we believe that these guidelines can be used to assess the quality of other applications in the future.

[1]  Reid Holmes,et al.  Coverage is not strongly correlated with test suite effectiveness , 2014, ICSE.

[2]  Giovanni Denaro,et al.  An empirical evaluation of fault-proneness models , 2002, ICSE '02.

[3]  David Leon,et al.  A comparison of coverage-based and distribution-based techniques for filtering and prioritizing test cases , 2003, 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003..

[4]  Taghi M. Khoshgoftaar,et al.  Detection of software modules with high debug code churn in a very large legacy system , 1996, Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering.

[5]  J. Hair Multivariate data analysis , 1972 .

[6]  Taghi M. Khoshgoftaar,et al.  LOGISTIC REGRESSION MODELING OF SOFTWARE QUALITY , 1999 .

[7]  ISO / IEC 25010 : 2011 Systems and software engineering — Systems and software Quality Requirements and Evaluation ( SQuaRE ) — System and software quality models , 2013 .

[8]  Brian Marick,et al.  EXPERIENCE WITH THE COST OF DIFFERENT COVERAGE GOALS FOR TESTING , 1991 .

[9]  Norman E. Fenton,et al.  Software metrics: successes, failures and new directions , 1999, J. Syst. Softw..

[10]  Shari Lawrence Pfleeger,et al.  Software metrics (2nd ed.): a rigorous and practical approach , 1997 .

[11]  Tadashi Dohi,et al.  Towards quantitative software reliability assessment in incremental development processes , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[12]  Michael R. Lyu,et al.  An empirical study on testing and fault tolerance for software reliability engineering , 2003, 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003..

[13]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[14]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation , 1993 .

[15]  Michael R. Lyu,et al.  Achieving software quality with testing coverage measures , 1994, Computer.

[16]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[17]  James M. Bieman,et al.  Software reliability growth with test coverage , 2002, IEEE Trans. Reliab..

[18]  Norman F. Schneidewind,et al.  Measuring and evaluating maintenance process using reliability, risk, and test metrics , 1997, 1997 Proceedings International Conference on Software Maintenance.

[19]  Brendan Murphy,et al.  Using Historical In-Process and Product Metrics for Early Estimation of Software Failures , 2006, 2006 17th International Symposium on Software Reliability Engineering.

[20]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation0 , 1984, CACM.

[21]  Shari Lawrence Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .

[22]  James M. Bieman,et al.  The relationship between test coverage and reliability , 1994, Proceedings of 1994 IEEE International Symposium on Software Reliability Engineering.

[23]  Audris Mockus,et al.  Test coverage and post-verification defects: A multiple case study , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[24]  Laurie A. Williams,et al.  An Empirical Study on the Relation between Dependency Neighborhoods and Failures , 2011, 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation.

[25]  John D. Musa,et al.  Software reliability - measurement, prediction, application , 1987, McGraw-Hill series in software engineering and technology.

[26]  Laurie A. Williams,et al.  Early estimation of software quality using in-process testing metrics: a controlled case study , 2005, ACM SIGSOFT Softw. Eng. Notes.

[27]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[28]  Simeon C. Ntafos,et al.  A Comparison of Some Structural Testing Strategies , 1988, IEEE Trans. Software Eng..

[29]  Hoang Pham,et al.  An analysis of factors affecting software reliability , 2000, J. Syst. Softw..

[30]  Laurie A. Williams,et al.  Realizing quality improvement through test driven development: results and experiences of four industrial teams , 2008, Empirical Software Engineering.

[31]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[32]  Norman E. Fenton,et al.  Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[33]  Atif M. Memon,et al.  Studying the fault-detection effectiveness of GUI test cases for rapidly evolving software , 2005, IEEE Transactions on Software Engineering.

[34]  Jie Tian,et al.  Experience report: Assessing the reliability of an industrial avionics software: Results, insights and recommendations , 2013, 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE).

[35]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[36]  Michael R. Lyu,et al.  The effect of code coverage on fault detection under different testing profiles , 2005, A-MOST.

[37]  Y. Benjamini,et al.  Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics , 1999 .

[38]  Daniel J. Paulish,et al.  An empirical investigation of software fault distribution , 1993, [1993] Proceedings First International Software Metrics Symposium.

[39]  Kostas Kevrekidis,et al.  Software complexity and testing effectiveness: An empirical study , 2009, 2009 Annual Reliability and Maintainability Symposium.

[40]  Thomas J. Ostrand,et al.  Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria , 1994, Proceedings of 16th International Conference on Software Engineering.

[41]  Barry W. Boehm,et al.  Quantitative evaluation of software quality , 1976, ICSE '76.

[42]  Brian Marick How to Misuse Code Coverage , 1999 .