Reliability-Oriented Verification of Mission-Critical Software Systems

With software systems increasingly being employed in critical contexts, assuring high reliability levels for large, complex systems can incur huge verification costs. Critical system developers often encounter serious difficulties in satisfying reliability requirements at competitive and acceptable cost and time. Currently, it is not clear how engineers should plan an effective verification strategy oriented to improve the final reliability, since it is not trivial to figure out what activities mainly impact the reliability-cost trade-off and how much they affect reliability. Most often, crucial choices in the verification activity are left to the engineers� intuition, which base their decisions on personal expertise and on past experience, due to the lack of convincing approaches coping with them. However, when dealing with high reliability targets and tight time/cost constraints, engineers responsible for verification should have quantitative evidences of the consequences of their choices, and base their decision on them. One fundamental aspect in a reliability-oriented verification process concerns the identification of the most critical parts of the system, i.e., the major contributors to its unreliability. This is crucial to conveniently distribute efforts for verification. However, even suitably allocating efforts, engineers should know what verification techniques most impact the final reliability, and what techniques are most suited for the features of the system under test. Hence, the proper selection of verification techniques that best adapt to the specific system being developed is another critical challenge to be addressed. Coping with these issues, engineers could tune a verification process for their systems simply following a quantitative reasoning able to highlight cost/benefits of each choice. Based on these considerations, the thesis proposes a solution to carrying out an effective verification specifically oriented to improve reliability. It intends to provide engineers with quantitative means that should be adopted and embedded in their process, to allow them conveniently allocating efforts and selecting techniques for the system under test. The thesis first identifies the major open challenges to be faced, by trying to figure out what are the most crucial steps that engineers need to take for an effective planning. Then, to cope with them, it proposes: i) an optimization model to allocate verification effort to different system components in order to achieve a required reliability level at minimum verification costs; ii) an approach, based on empirical analyses, to quantitatively support the selection of the best verification techniques; iii) a procedure to improve verification processes in the considered class of systems, able to iteratively refine results across the developed projects.

[1]  Kishor S. Trivedi,et al.  A Best Practice Guide to Resources Forecasting for the Apache Webserver , 2006, 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06).

[2]  William H. Sanders,et al.  A connection formalism for the solution of large and stiff models , 2001, Proceedings. 34th Annual Simulation Symposium.

[3]  Swapna S. Gokhale,et al.  Incorporating fault debugging activities into software reliability models: a simulation approach , 2006, IEEE Transactions on Reliability.

[4]  Elaine J. Weyuker Comparing the Effectiveness of Testing Techniques , 2008, Formal Methods and Testing.

[5]  Luís Moura Silva,et al.  Software Aging and Rejuvenation in a SOAP-based Server , 2006, Fifth IEEE International Symposium on Network Computing and Applications (NCA'06).

[6]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[7]  Kishor S. Trivedi,et al.  Quantifying software performance, reliability and security: An architecture-based approach , 2007, J. Syst. Softw..

[8]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[9]  E Marshall,et al.  Fatal error: how patriot overlooked a scud. , 1992, Science.

[10]  S. P. Levitan,et al.  Reliability optimization models for embedded systems with multiple applications , 2004, IEEE Transactions on Reliability.

[11]  Kishor S. Trivedi,et al.  Optimal Software Rejuvenation for Tolerating Soft Failures , 1996, Perform. Evaluation.

[12]  T. Pasquale,et al.  Hazard analysis of complex distributed railway systems , 2003, 22nd International Symposium on Reliable Distributed Systems, 2003. Proceedings..

[13]  R. Halstead,et al.  Using Process History to Predict Software Quality , 1998, Computer.

[14]  Kishor S. Trivedi,et al.  A workload-based analysis of software aging, and rejuvenation , 2005, IEEE Transactions on Reliability.

[15]  Kishor S. Trivedi,et al.  Analysis of Preventive Maintenance in Transactions Based Software Systems , 1998, IEEE Trans. Computers.

[16]  Swapna S. Gokhale,et al.  Reliability prediction and sensitivity analysis based on software architecture , 2002, 13th International Symposium on Software Reliability Engineering, 2002. Proceedings..

[17]  Kishor S. Trivedi,et al.  A decomposition approach reward net models * for stochastic , 1993 .

[18]  Henrique Madeira,et al.  Definition of software fault emulation operators: a field data study , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[19]  R.B. Misra,et al.  Economic allocation of target reliability in modular software systems , 2005, Annual Reliability and Maintainability Symposium, 2005. Proceedings..

[20]  Salvatore J. Bavuso,et al.  Dynamic fault-tree models for fault-tolerant computer systems , 1992 .

[21]  Giuliana Franceschinis,et al.  Parametric Fault Tree for the Dependability Analysis of Redundant Systems and Its High-Level Petri Net Semantics , 2003, IEEE Trans. Software Eng..

[22]  Valeria Vittorini,et al.  Repairable fault tree for the automatic evaluation of repair policies , 2004, International Conference on Dependable Systems and Networks, 2004.

[23]  Elaine J. Weyuker,et al.  Deriving Workloads for Performance Testing , 1996, Softw. Pract. Exp..

[24]  Albert Y. Zomaya,et al.  Dependable computing systems : paradigms, performance issues, and applications , 2005 .

[25]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[26]  A. Rindos,et al.  Availability Modeling of SIP Protocol on IBM , 2008 .

[27]  Kishor S. Trivedi,et al.  Availability analysis of blade server systems , 2008, IBM Syst. J..

[28]  Ye Wu,et al.  An architecture-based software reliability model , 1999, Proceedings 1999 Pacific Rim International Symposium on Dependable Computing.

[29]  Simeon C. Ntafos,et al.  An Evaluation of Random Testing , 1984, IEEE Transactions on Software Engineering.

[30]  Boudewijn R. Haverkort,et al.  Performance and reliability analysis of computer systems: An example-based approach using the sharpe software package , 1998 .

[31]  Karama Kanoun,et al.  Construction and stepwise refinement of dependability models , 2004, Perform. Evaluation.

[32]  Hiroshi Kamada,et al.  Surrogate Constraints Algorithm for Reliability Optimization Problems with Multiple Constraints , 1981, IEEE Transactions on Reliability.

[33]  A. Wood Availability modeling , 1994, IEEE Circuits and Devices Magazine.

[34]  Way Kuo,et al.  Recent Advances in Optimal Reliability Allocation , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[35]  Inderpal S. Bhandari,et al.  Orthogonal Defect Classification - A Concept for In-Process Measurements , 1992, IEEE Trans. Software Eng..

[36]  Kim Fowler Mission-critical and safety-critical development , 2004 .

[37]  Kishor S. Trivedi,et al.  A methodology for detection and estimation of software aging , 1998, Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257).

[38]  Kishor S. Trivedi,et al.  Performance And Reliability Analysis Of Computer Systems (an Example-based Approach Using The Sharpe Software , 1997, IEEE Transactions on Reliability.

[39]  Kishor S. Trivedi,et al.  Numerical transient analysis of markov models , 1988, Comput. Oper. Res..

[40]  S. N. Weiss,et al.  All-Uses versus Mutation Testing : An ExperimentalComparison of E ectiveness , 1996 .

[41]  Kishor S. Trivedi,et al.  Minimizing completion time of a program by checkpointing and rejuvenation , 1996, SIGMETRICS '96.

[42]  Kishor S. Trivedi,et al.  Model Based Approach for Autonomic Availability Management , 2006, ISAS.

[43]  Kishor S. Trivedi,et al.  A measurement-based model for estimation of resource exhaustion in operational software systems , 1999, Proceedings 10th International Symposium on Software Reliability Engineering (Cat. No.PR00443).

[44]  Brian Randell,et al.  Changes to: Sensitivity of Reliability-Growth Models to Operational Profile Errors vs Testing Accuracy , 1997 .

[45]  Michiel van Genuchten,et al.  Using Software Reliability Growth Models in Practice , 2007, IEEE Software.

[46]  Elaine J. Weyuker,et al.  Selecting Software Test Data Using Data Flow Information , 1985, IEEE Transactions on Software Engineering.

[47]  Katerina Goseva-Popstojanova,et al.  Architecture-based approach to reliability assessment of software systems , 2001, Perform. Evaluation.

[48]  Michael R. Lyu,et al.  Optimal allocation of test resources for software reliability growth modeling in software development , 2002, IEEE Trans. Reliab..

[49]  Giovanni Denaro,et al.  An empirical evaluation of fault-proneness models , 2002, ICSE '02.

[50]  Hany H. Ammar,et al.  A scenario-based reliability analysis approach for component-based software , 2004, IEEE Transactions on Reliability.

[51]  Ramanath Subramanyam,et al.  Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects , 2003, IEEE Trans. Software Eng..

[52]  Kishor S. Trivedi,et al.  A comprehensive model for software rejuvenation , 2005, IEEE Transactions on Dependable and Secure Computing.

[53]  Ravishankar K. Iyer,et al.  Dependability Measurement and Modeling of a Multicomputer System , 1993, IEEE Trans. Computers.

[54]  Kishor S. Trivedi,et al.  Availability Monitor for a Software Based System , 2007 .

[55]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[56]  Richard G. Hamlet,et al.  Theoretical comparison of testing methods , 1989, TAV3.

[57]  Natalia Juristo Juzgado,et al.  Reviewing 25 Years of Testing Technique Experiments , 2004, Empirical Software Engineering.

[58]  Amrit L. Goel,et al.  Software Reliability Models: Assumptions, Limitations, and Applicability , 1985, IEEE Transactions on Software Engineering.

[59]  Winfrid G. Schneeweiss “Review of Petri Net Picture Book” and “Petri Nets for Reliability Modeling” , 2006, IEEE Transactions on Reliability.

[60]  Kishor S. Trivedi,et al.  Analysis of software rejuvenation using Markov Regenerative Stochastic Petri Net , 1995, Proceedings of Sixth International Symposium on Software Reliability Engineering. ISSRE'95.

[61]  Sy-Yen Kuo,et al.  Efficient allocation of testing resources for software module testing based on the hyper-geometric distribution software reliability growth model , 1996, Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering.

[62]  Yves Crouzet,et al.  An experimental study on software structural testing: deterministic versus random input generation , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[63]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[64]  Yennun Huang,et al.  Software rejuvenation: analysis, module and applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[65]  Robert E. Mullen,et al.  The lognormal distribution of software failure rates: origin and evidence , 1998, Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257).

[66]  Mauro Pezzè,et al.  Software testing and analysis - process, principles and techniques , 2007 .

[67]  Phyllis G. Frankl,et al.  An Experimental Comparison of the Effectiveness of Branch Testing and Data Flow Testing , 1993, IEEE Trans. Software Eng..

[68]  Bev Littlewood,et al.  Evaluating Testing Methods by Delivered Reliability , 1998, IEEE Trans. Software Eng..

[69]  Way Kuo,et al.  Determining Component Reliability and Redundancy for Optimum System Reliability , 1977, IEEE Transactions on Reliability.

[70]  Wei Xie,et al.  Performability analysis of clustered systems with rejuvenation under varying workload , 2007, Perform. Evaluation.

[71]  Shari Lawrence Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .

[72]  Yuji Nakagawa,et al.  Surrogate Constraints Algorithm for Reliability Optimization Problems with Two Constraints , 1981, IEEE Transactions on Reliability.

[73]  Elaine J. Weyuker,et al.  A Formal Analysis of the Fault-Detecting Ability of Testing Methods , 1993, IEEE Trans. Software Eng..

[74]  Yi Pan,et al.  A Hierarchical Modeling and Analysis for Grid Service Reliability , 2007, IEEE Transactions on Computers.

[75]  John S. Gourlay A Mathematical Framework for the Investigation of Testing , 1983, IEEE Transactions on Software Engineering.

[76]  Adamantios Mettas,et al.  Reliability allocation and optimization for complex systems , 2000, Annual Reliability and Maintainability Symposium. 2000 Proceedings. International Symposium on Product Quality and Integrity (Cat. No.00CH37055).

[77]  Johan Karlsson,et al.  Experimental Dependability Evaluation of the Artk68-FT Real-time Kernel , 2004 .

[78]  Jeffrey M. Voas,et al.  Predicting How Badly "Good" Software Can Behave , 1997, IEEE Softw..

[79]  Stephen R. Schach,et al.  Validation of the coupling dependency metric as a predictor of run-time failures and maintenance measures , 1998, Proceedings of the 20th International Conference on Software Engineering.

[80]  Taghi M. Khoshgoftaar,et al.  The Detection of Fault-Prone Programs , 1992, IEEE Trans. Software Eng..

[81]  Swapna S. Gokhale,et al.  An analytical approach to architecture-based software performance and reliability prediction , 2004, Perform. Evaluation.

[82]  Alan P. Wood,et al.  Multistate Block Diagrams and Fault Trees , 1985, IEEE Transactions on Reliability.

[83]  Swapna S. Gokhale,et al.  Log-logistic software reliability growth model , 1998, Proceedings Third IEEE International High-Assurance Systems Engineering Symposium (Cat. No.98EX231).

[84]  Laura Painton,et al.  Genetic algorithms in optimization of system reliability. , 1995 .

[85]  R. Taylor,et al.  Partition testing does not inspire confidence , 1988, [1988] Proceedings. Second Workshop on Software Testing, Verification, and Analysis.

[86]  Swapna S. Gokhale,et al.  A time/structure based software reliability model , 1999, Ann. Softw. Eng..

[87]  Jim Gray,et al.  Why Do Computers Stop and What Can Be Done About It? , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[88]  Kishor S. Trivedi,et al.  Software Reliability and Testing Time Allocation: An Architecture-Based Approach , 2010, IEEE Transactions on Software Engineering.

[89]  Swapna S. Gokhale,et al.  Regression Tree Modeling For The Prediction Of Software Quality , 1997 .

[90]  Kenny C. Gross,et al.  Advanced pattern recognition for detection of complex software aging phenomena in online transaction processing servers , 2002, Proceedings International Conference on Dependable Systems and Networks.

[91]  J. Onishi,et al.  Solving the Redundancy Allocation Problem With a Mix of Components Using the Improved Surrogate Constraint Method , 2007, IEEE Transactions on Reliability.

[92]  Michael R. Lyu,et al.  Optimization of reliability allocation and testing schedule for software systems , 1997, Proceedings The Eighth International Symposium on Software Reliability Engineering.

[93]  Niclas Ohlsson,et al.  Predicting Fault-Prone Software Modules in Telephone Switches , 1996, IEEE Trans. Software Eng..

[94]  Jim Gray,et al.  A census of Tandem system availability between 1985 and 1990 , 1990 .

[95]  Aditya P. Mathur,et al.  Comparison of architecture-based software reliability models , 2001, Proceedings 12th International Symposium on Software Reliability Engineering.

[96]  Henrique Madeira,et al.  Emulation of Software Faults: A Field Data Study and a Practical Approach , 2006, IEEE Transactions on Software Engineering.

[97]  C. R. Cassady,et al.  Simplifying the solution of redundancy allocation problems , 1999, Annual Reliability and Maintainability. Symposium. 1999 Proceedings (Cat. No.99CH36283).

[98]  Elaine J. Weyuker,et al.  Comparison of program testing strategies , 1991, TAV4.

[99]  Kishor S. Trivedi,et al.  Analysis of Software Aging in a Web Server , 2006, IEEE Transactions on Reliability.

[100]  Kishor S. Trivedi,et al.  Fighting bugs: remove, retry, replicate, and rejuvenate , 2007, Computer.

[101]  Amrit L. Goel,et al.  Time-Dependent Error-Detection Rate Model for Software Reliability and Other Performance Measures , 1979, IEEE Transactions on Reliability.

[102]  Sandro Morasca,et al.  Deriving models of software fault-proneness , 2002, SEKE '02.

[103]  Kishor S. Trivedi,et al.  An approach for estimation of software aging in a Web server , 2002, Proceedings International Symposium on Empirical Software Engineering.

[104]  Elaine J. Weyuker,et al.  Metrics to Assess the Likelihood of Project Success Based on Architecture Reviews , 2004, Empirical Software Engineering.