Automatic Software Fault Diagnosis by Exploiting Application Signatures

Application problem diagnosis in complex enterprise environments is a challenging problem, and contributes significantly to the growth in IT management costs. While application problems have a large number of possible causes, failures due to runtime interactions with the system environment (e.g., configuration files, resource limitations, access permissions) are one of the most common categories. Troubleshooting these problems requires extensive experience and time, and is very difficult to automate. In this paper, we propose a black-box approach that can automatically diagnose several classes of application faults using applications' runtime behaviors. These behaviors along with various system states are combined to create signatures that serve as a baseline of normal behavior. When an application fails, the faulty behavior is analyzed against the signature to identify deviations from expected behavior and likely cause. We implement a diagnostic tool based on this approach and demonstrate its effectiveness in a number of case studies with realistic problems in widely-used applications. We also conduct a number of experiments to show that the impact of the diagnostic tool on application performance (with some modifications of platform tracing facilities), as well as storage requirements for signatures, are both reasonably low.

[1]  Shan Lu,et al.  Flight data recorder: monitoring persistent-state interactions to improve systems management , 2006, OSDI '06.

[2]  Steven P. Reiss,et al.  Fault localization with nearest neighbor queries , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[3]  Sudheendra Hangal,et al.  Tracking down software bugs using automatic anomaly detection , 2002, ICSE '02.

[4]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[5]  Roy T. Fielding,et al.  The Apache HTTP Server Project , 1997, IEEE Internet Comput..

[6]  Barak A. Pearlmutter,et al.  Detecting intrusions using system calls: alternative data models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[7]  David A. Wagner,et al.  Intrusion detection via static analysis , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[8]  Steven D. Gribble,et al.  Configuration Debugging as Search: Finding the Needle in the Haystack , 2004, OSDI.

[9]  Helen J. Wang,et al.  Automatic Misconfiguration Troubleshooting with PeerPressure , 2004, OSDI.

[10]  Weibo Gong,et al.  Anomaly detection using call stack information , 2003, 2003 Symposium on Security and Privacy, 2003..

[11]  Mona Attariyan,et al.  AutoBash: improving configuration management with operating system causality analysis , 2007, SOSP.

[12]  Stephanie Forrest,et al.  Intrusion Detection Using Sequences of System Calls , 1998, J. Comput. Secur..

[13]  Wei-Ying Ma,et al.  Automated known problem diagnosis with event traces , 2006, EuroSys.

[14]  Anees Shaikh,et al.  PDA: A Tool for Automated Problem Determination , 2007, LISA.

[15]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[16]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[17]  Gregg Rothermel,et al.  An empirical investigation of the relationship between spectra differences and regression faults , 2000, Softw. Test. Verification Reliab..

[18]  Donald E. Porter,et al.  Improved error reporting for software that uses black-box components , 2007, PLDI '07.

[19]  Helen J. Wang,et al.  Strider: a black-box, state-based approach to change and configuration management and support , 2003, Sci. Comput. Program..