Extracting Interaction-Related Failure Indicators for Online Detection and Prediction of Content Failures

With the increasing complexity of software-intensive systems, software health management is proposed to assure their runtime dependability, in which online failure detection and prediction is one of the most significant components. Failure indicators are characteristics of internal states and behavior of a system which indicate potential failures. However, previous studies mostly focused on extracting failure indicators from network and hardware outside of a software system or operating system level, neglected the runtime dynamics on application level. Besides, most of these studies aimed at detecting and predicting performance-related failures. As a major category of software failures, content failures are often omitted. This paper proposes an experiment-based approach to extract interaction-related failure indicators on application level for content failures, composed of abnormal execution time of modules and abnormal interaction times between modules. Firstly, an experiment-based failure data generation method is proposed due to a lack of real-world failure data which can reflect the runtime states and behavior of a software system. Then a machine learning method is selected and applied on the failure dataset to construct classifiers for normal data and failure data, from which failure indicators are extracted. Finally, three open-source software were selected to show the validity of our extracting method and the effectiveness of the extracted failure indicators. Interaction-related failure indicators extracted by the proposed approach can be used for runtime failure detection and prediction of content failures, thus improving runtime dependability of complex software-intensive systems.

[1]  Roberto Baldoni,et al.  On-line failure prediction in safety-critical systems , 2015, Future Gener. Comput. Syst..

[2]  K. C. Gross,et al.  Proactive detection of software aging mechanisms in performance critical computers , 2002, 27th Annual NASA Goddard/IEEE Software Engineering Workshop, 2002. Proceedings..

[3]  John B. Goodenough,et al.  Reliability Validation and Improvement Framework , 2012 .

[4]  Johann Schumann,et al.  The Case for Software Health Management , 2011, 2011 IEEE Fourth International Conference on Space Mission Challenges for Information Technology.

[5]  Cemal Yilmaz,et al.  Seer: A Lightweight Online Failure Prediction Approach , 2017, IEEE Transactions on Software Engineering.

[6]  Miroslaw Malek,et al.  A survey of online failure prediction methods , 2010, CSUR.

[7]  Luís Moura Silva,et al.  Deterministic Models of Software Aging and Optimal Rejuvenation Schedules , 2007, 2007 10th IFIP/IEEE International Symposium on Integrated Network Management.

[8]  Miroslaw Malek,et al.  Using Hidden Semi-Markov Models for Effective Online Failure Prediction , 2007, 2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007).

[9]  Kishor S. Trivedi,et al.  A measurement-based model for estimation of resource exhaustion in operational software systems , 1999, Proceedings 10th International Symposium on Software Reliability Engineering (Cat. No.PR00443).

[10]  Dimiter R. Avresky,et al.  A Machine Learning-Based Framework for Building Application Failure Prediction Models , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[11]  Luyi Li,et al.  Constructing runtime models of complex software-intensive systems for analysis of failure mechanism , 2015, 2015 First International Conference on Reliability Systems Engineering (ICRSE).

[12]  Brian Randell,et al.  Fundamental Concepts of Dependability , 2000 .

[13]  Lars Grunske,et al.  An Architecture-Aware Approach to Hierarchical Online Failure Prediction , 2016, 2016 12th International ACM SIGSOFT Conference on Quality of Software Architectures (QoSA).

[14]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[15]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[16]  Armando Fox,et al.  Detecting application-level failures in component-based Internet services , 2005, IEEE Transactions on Neural Networks.

[17]  Kishor S. Trivedi,et al.  A Best Practice Guide to Resource Forecasting for Computing Systems , 2007, IEEE Transactions on Reliability.

[18]  Marco Vieira,et al.  Towards Identifying the Best Variables for Failure Prediction Using Injection of Realistic Software Faults , 2010, 2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing.

[19]  Marco Vieira,et al.  Adaptive Failure Prediction for Computer Systems: A Framework and a Case Study , 2015, 2015 IEEE 16th International Symposium on High Assurance Systems Engineering.

[20]  Kishor S. Trivedi,et al.  Analysis of Software Aging in a Web Server , 2006, IEEE Transactions on Reliability.

[21]  Douglas C. Schmidt,et al.  Ultra-Large-Scale Systems: The Software Challenge of the Future , 2006 .

[22]  Johann Schumann,et al.  Software health management: a necessity for safety critical systems , 2013, Innovations in Systems and Software Engineering.

[23]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[24]  Adam A. Porter,et al.  Combining hardware and software instrumentation to classify program executions , 2010, FSE '10.

[25]  Marco Vieira,et al.  Towards Assessing Representativeness of Fault Injection-Generated Failure Data for Online Failure Prediction , 2015, 2015 IEEE International Conference on Dependable Systems and Networks Workshops.

[26]  Andrea Bondavalli,et al.  An OS-level Framework for Anomaly Detection in Complex Software Systems , 2015, IEEE Transactions on Dependable and Secure Computing.