Why PCs are fragile and what we can do about it: a study of Windows registry problems

Software configuration problems are a major source of failures in computer systems. In this paper, we present a new framework for categorizing configuration problems. We apply this categorization to Windows registry-related problems obtained from various internal as well as external sources. Although infrequent, registry-related problems are difficult to diagnose and repair. Consequently they frustrate the users. We classify problems based on their manifestation and the scope of impact to gain useful insights into how problems affect users and why PCs are fragile. We then describe techniques to identify and eliminate such registry failures. We propose health predicate monitoring for detecting known problems, fault injection for improving application, robustness, and access protection mechanisms for preventing fragility problems.

[1]  Yi-Min Wang,et al.  Persistent-state checkpoint comparison for troubleshooting configuration failures , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[2]  Archana Ganapathi,et al.  Why Do Internet Services Fail, and What Can Be Done About It? , 2002, USENIX Symposium on Internet Technologies and Systems.

[3]  Ravishankar K. Iyer,et al.  Software Dependability in the Tandem GUARDIAN System , 1995, IEEE Trans. Software Eng..

[4]  Ravishankar K. Iyer,et al.  Analysis of the VAX/VMS error logs in multicomputer environments-a case study of software dependability , 1992, [1992] Proceedings Third International Symposium on Software Reliability Engineering.

[5]  Salvatore J. Stolfo,et al.  Detecting Malicious Software by Monitoring Anomalous Windows Registry Accesses , 2002, RAID.

[6]  Wentian Li,et al.  Random texts exhibit Zipf's-law-like word frequency distribution , 1992, IEEE Trans. Inf. Theory.

[7]  Ram Chillarege,et al.  Understanding large system failures-a fault injection experiment , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[8]  Ytzhak H. Levendel,et al.  Defects and reliability analysis of large software systems: field experience , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[9]  Brendan Murphy,et al.  Progress on Defining Standardized Classes for Comparing the Dependability of Computer Systems , 2002 .

[10]  Helen J. Wang,et al.  Strider: a black-box, state-based approach to change and configuration management and support , 2003, Sci. Comput. Program..

[11]  Matthew Merzbacher,et al.  Measuring end-user availability on the Web: practical experience , 2002, Proceedings International Conference on Dependable Systems and Networks.

[12]  Ravishankar K. Iyer,et al.  Analysis of failures in the Tandem NonStop-UX Operating System , 1995, Proceedings of Sixth International Symposium on Software Reliability Engineering. ISSRE'95.

[13]  Mahesh Chittur Kalyanakrishnan,et al.  Analysis of Failures in Windows NT Systems , 1998 .

[14]  Ram Chillarege,et al.  Measurement of failure rate in widely distributed software , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.