Online Monitoring of Software System Reliability

Reliability is one of the major concerns for software engineers. The increasing size of software systems and their inherent complexity - which is essentially related to the intricate interdependencies among many heterogeneous components - pose serious difficulties to its assessment and assurance. The actual system runtime behavior is difficult to forecast during the development phase, and just relying upon sound design and testing techniques is often not sufficient to deliver highly reliable systems. In order to guarantee high reliability, system behavior needs to be monitored at runtime and its reliability needs to be periodically estimated during operation, taking into account both structural/static and behavioral/dynamic information. In this paper, we propose an online reliability monitoring approach, which combines static reliability modeling and dynamic analysis to periodically evaluate system reliability trend during operation. Its usage is illustrated by a prototype implementation and a case- study.

[1]  Sudheendra Hangal,et al.  Tracking down software bugs using automatic anomaly detection , 2002, ICSE '02.

[2]  Kishor S. Trivedi,et al.  Availability Monitor for a Software Based System , 2007, 10th IEEE High Assurance Systems Engineering Symposium (HASE'07).

[3]  Yi Pan,et al.  A Hierarchical Modeling and Analysis for Grid Service Reliability , 2007, IEEE Transactions on Computers.

[4]  Joanne Bechta Dugan,et al.  Automatic synthesis of dynamic fault trees from UML system models , 2002, 13th International Symposium on Software Reliability Engineering, 2002. Proceedings..

[5]  Katerina Goseva-Popstojanova,et al.  Architecture-based approach to reliability assessment of software systems , 2001, Perform. Evaluation.

[6]  Kishor S. Trivedi,et al.  Availability Modeling of SIP Protocol on IBM© WebSphere© , 2008, 2008 14th IEEE Pacific Rim International Symposium on Dependable Computing.

[7]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[8]  Joanne Bechta Dugan Automated analysis of phased-mission reliability , 1991 .

[9]  Andrea Bondavalli,et al.  Threshold-Based Mechanisms to Discriminate Transient from Intermittent Faults , 2000, IEEE Trans. Computers.

[10]  Kishor S. Trivedi,et al.  Model Based Approach for Autonomic Availability Management , 2006, ISAS.

[11]  Swapna S. Gokhale,et al.  An analytical approach to architecture-based software performance and reliability prediction , 2004, Perform. Evaluation.

[12]  Alberto Pasquini,et al.  Sensitivity of reliability-growth models to operational profile errors vs. testing accuracy [software testing] , 1996, IEEE Trans. Reliab..

[13]  Ravishankar K. Iyer,et al.  Dependability Measurement and Modeling of a Multicomputer System , 1993, IEEE Trans. Computers.

[14]  Ye Wu,et al.  An architecture-based software reliability model , 1999, Proceedings 1999 Pacific Rim International Symposium on Dependable Computing.

[15]  A. Rindos,et al.  Availability Modeling of SIP Protocol on IBM , 2008 .

[16]  Kishor S. Trivedi,et al.  Availability analysis of blade server systems , 2008, IBM Syst. J..

[17]  A. Wood Availability modeling , 1994, IEEE Circuits and Devices Magazine.

[18]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[19]  Andrea Bondavalli,et al.  Markov Regenerative Stochastic Petri Nets to Model and Evaluate Phased Mission Systems Dependability , 2001, IEEE Trans. Computers.

[20]  Karama Kanoun,et al.  Performability Evaluation of Multipurpose Multiprocessor Systems: The "Separation of Concerns" Approach , 2003, IEEE Trans. Computers.

[21]  William H. Sanders,et al.  A connection formalism for the solution of large and stiff models , 2001, Proceedings. 34th Annual Simulation Symposium.

[22]  Swapna S. Gokhale,et al.  Reliability simulation of component-based software systems , 1998, Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257).

[23]  Darrell D. E. Long,et al.  A longitudinal survey of Internet host reliability , 1995, Proceedings. 14th Symposium on Reliable Distributed Systems.

[24]  Hong Yan,et al.  Dynamically discovering architectures with DiscoTect , 2005, ESEC/FSE-13.

[25]  Andreas Zeller,et al.  Mining object behavior with ADABU , 2006, WODA '06.

[26]  W. Marsden I and J , 2012 .

[27]  Kishor S. Trivedi,et al.  A Decomposition Approach for Stochastic Reward Net Models , 1993, Perform. Evaluation.

[28]  Kishor S. Trivedi,et al.  Quantifying software performance, reliability and security: An architecture-based approach , 2007, J. Syst. Softw..

[29]  Aditya P. Mathur,et al.  Comparison of architecture-based software reliability models , 2001, Proceedings 12th International Symposium on Software Reliability Engineering.