Leveraging many simple statistical models to adaptively monitor software systems

Self-managing systems require continuous monitoring to ensure correct operation. Detailed monitoring is often too costly to use in production. An alternative is adaptive monitoring, whereby monitoring is kept to a minimal level while the system behaves as expected, and the monitoring level is increased if a problem is suspected. To enable such an approach, we must model the system, both at a minimal level to ensure correct operation, and at a detailed level, to diagnose faulty components. To avoid the complexity of developing an explicit model based on the system structure, we employ simple statistical techniques to identify relationships in the monitored data. These relationships are used to characterize normal operation and identify problematic areas. We develop and evaluate a prototype for the adaptive monitoring of J2EE applications. We experiment with 29 different fault scenarios of three general types, and show that we are able to detect the presence of faults in 80% of cases, where all but one instance of non-detection is attributable to a single fault type. We are able to shortlist the faulty component in 65% of cases where anomalies are observed.

[1]  C. D. Beaumont,et al.  Regression Diagnostics — Identifying Influential Data and Sources of Collinearity , 1981 .

[2]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[3]  W. W. Muir,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[4]  Armando Fox,et al.  Detecting application-level failures in component-based Internet services , 2005, IEEE Transactions on Neural Networks.

[5]  Vijay Mann,et al.  Problem Determination in Enterprise Middleware Systems using Change Point Correlation of Time Series Data , 2006, 2006 IEEE/IFIP Network Operations and Management Symposium NOMS 2006.

[6]  Barton P. Miller,et al.  Autonomous analysis of interactive systems with self-propelled instrumentation , 2005, IS&T/SPIE Electronic Imaging.

[7]  Paul A. S. Ward,et al.  ADAPTIVE MONITORING IN ENTERPRISE SOFTWARE SYSTEMS , 2006 .

[8]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[9]  Karen Appleby,et al.  Threshold management for problem determination in transaction based e-commerce systems , 2005, 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005..

[10]  Soila Pertet,et al.  Causes of Failure in Web Applications (CMU-PDL-05-109) , 2005 .

[11]  Haifeng Chen,et al.  Modeling and Tracking of Transaction Flow Dynamics for Fault Detection in Complex Systems , 2006, IEEE Transactions on Dependable and Secure Computing.

[12]  Paul A. S. Ward,et al.  A comparative study of pairwise regression techniques for problem determination , 2007, CASCON.

[13]  Mikhail Dmitriev Profiling Java applications using code hotswapping and dynamic call graph revelation , 2004, WOSP '04.

[14]  Matthias Hauswirth,et al.  Vertical profiling: understanding the behavior of object-priented applications , 2004, OOPSLA.

[15]  Aaron B. Brown,et al.  An active approach to characterizing dynamic dependencies for problem determination in a distributed environment , 2001, 2001 IEEE/IFIP International Symposium on Integrated Network Management Proceedings. Integrated Network Management VII. Integrated Management Strategies for the New Millennium (Cat. No.01EX470).

[16]  David Patterson,et al.  Self-repairing computers. , 2003, Scientific American.

[17]  D. Hecker Occupational employment projections to 2014 , 2001 .

[18]  Jeffrey S. Chase,et al.  Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control , 2004, OSDI.

[19]  Fan Zhang,et al.  Characterizing Normal Operation of a Web Server: Application to Workload Forecasting and Problem Determination , 1998, Int. CMG Conference.

[20]  Paul A. S. Ward,et al.  Interaction Analysis of Heterogeneous Monitoring Data for Autonomic Problem Determination , 2007, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07).