An empirical exploratory study on operating system reliability

High-reliability applications run on top of commodity operating systems, which hence must provide high reliable services. In this paper, we conduct an empirical exploratory study on OS reliability. We analyze more than 30,000 real OS failure data collected from different workplace environments. The results show that the main cause of OS failures is related to OS services, and not OS applications or OS kernel. The Gamma and Weibull distributions presented the best fit to the OS failure data. We also found that OS kernel failures are more prevalent in enterprise workplaces than academics, where the observed failure rate in the former was higher than in the latter.

[1]  Junfeng Yang,et al.  An empirical study of operating systems errors , 2001, SOSP.

[2]  Alfonso Fuggetta,et al.  Software process: a roadmap , 2000, ICSE '00.

[3]  Franklin A. Graybill,et al.  Introduction to the Theory of Statistics, 3rd ed. , 1974 .

[4]  Ravishankar K. Iyer,et al.  Networked Windows NT system field failure data analysis , 1999, Proceedings 1999 Pacific Rim International Symposium on Dependable Computing.

[5]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[6]  Olivier Thas,et al.  Smooth Tests of Goodness of Fit: Using R , 2009 .

[7]  Прикладное программное обеспечение Windows Error Reporting , 2012 .

[8]  Song Xue,et al.  Reliability Assessment of Mass-Market Software: Insights from Windows Vista® , 2008, 2008 19th International Symposium on Software Reliability Engineering (ISSRE).

[9]  Terry Williams,et al.  Probability and Statistics with Reliability, Queueing and Computer Science Applications , 1983 .

[10]  J. Wolfowitz,et al.  An Introduction to the Theory of Statistics , 1951, Nature.

[11]  Brendan Murphy Automating Software Failure Reporting , 2004, ACM Queue.

[12]  M. Lipow,et al.  Number of Faults per Line of Code , 1982, IEEE Transactions on Software Engineering.

[13]  Ravishankar K. Iyer,et al.  Failure data analysis of a LAN of Windows NT based computers , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.

[14]  Standard Glossary of Software Engineering Terminology , 1990 .

[15]  Archana Ganapathi,et al.  Windows XP Kernel Crash Analysis , 2006, LISA.

[16]  Brian N. Bershad,et al.  Improving the reliability of commodity operating systems , 2005, TOCS.

[17]  Archana Ganapathi,et al.  Crash data collection: a Windows case study , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[18]  Norman F. Schneidewind,et al.  Applying reliability models to the space shuttle , 1992, IEEE Software.

[19]  Michael R. Lyu Software Reliability Engineering: A Roadmap , 2007, Future of Software Engineering (FOSE '07).

[20]  Marlin L. Gendron,et al.  Electronic Moving Map , 2003 .

[21]  David J. Groggel,et al.  Practical Nonparametric Statistics , 2000, Technometrics.

[22]  Chenchen Zhang,et al.  Evaluation strategy for software reliability based on ANFIS , 2011, 2011 International Conference on Electronics, Communications and Control (ICECC).

[23]  Herbert Bos,et al.  Can we make operating systems reliable and secure? , 2006, Computer.

[24]  Nancy G. Leveson,et al.  An investigation of the Therac-25 accidents , 1993, Computer.

[25]  H. Christopher Frey,et al.  Probabilistic Techniques in Exposure Assessment: A Handbook for Dealing with Variability and Uncertainty in Models and Inputs , 1999 .