Anomaly? application change? or workload change? towards automated detection of application performance anomaly and change

Automated tools for understanding application behavior and its changes during the application life-cycle are essential for many performance analysis and debugging tasks. Application performance issues have an immediate impact on customer experience and satisfaction. A sudden slowdown of enterprise-wide application can effect a large population of customers, lead to delayed projects and ultimately can result in company financial loss. We believe that online performance modeling should be a part of routine application monitoring. Early, informative warnings on significant changes in application performance should help service providers to timely identify and prevent performance problems and their negative impact on the service. We propose a novel framework for automated anomaly detection and application change analysis. It is based on integration of two complementary techniques: i) a regression-based transaction model that reflects a resource consumption model of the application, and ii) an application performance signature that provides a compact model of run-time behavior of the application. The proposed integrated framework provides a simple and powerful solution for anomaly detection and analysis of essential performance changes in application behavior. An additional benefit of the proposed approach is its simplicity: it is not intrusive and is based on monitoring data that is typically available in enterprise production environments.

[1]  N. Draper,et al.  Applied Regression Analysis: Draper/Applied Regression Analysis , 1998 .

[2]  Evgenia Smirni,et al.  Analysis of application performance and its change via representative application signatures , 2008, NOMS 2008 - 2008 IEEE Network Operations and Management Symposium.

[3]  Armando Fox,et al.  Capturing, indexing, clustering, and retrieving system history , 2005, SOSP '05.

[4]  David A. Patterson,et al.  Path-Based Failure and Evolution Management , 2004, NSDI.

[5]  Marcos K. Aguilera,et al.  Performance debugging for distributed systems of black boxes , 2003, SOSP '03.

[6]  Christopher Stewart,et al.  Exploiting nonstationarity for performance prediction , 2007, EuroSys '07.

[7]  Richard Mortier,et al.  Using Magpie for Request Extraction and Workload Modelling , 2004, OSDI.

[8]  Amin Vahdat,et al.  Measuring and characterizing end-to-end Internet service performance , 2003, TOIT.

[9]  Magnus Karlsson,et al.  Dynamics and evolution of Web sites: analysis, metrics and design issues , 2001, Proceedings. Sixth IEEE Symposium on Computers and Communications.

[10]  Qi Zhang,et al.  A Regression-Based Analytic Model for Dynamic Resource Provisioning of Multi-Tier Applications , 2007, Fourth International Conference on Autonomic Computing (ICAC'07).

[11]  Qi Zhang,et al.  R-Capriccio: A Capacity Planning and Anomaly Detection Tool for Enterprise Services with Live Workloads , 2007, Middleware.

[12]  Richard F. Gunst,et al.  Applied Regression Analysis , 1999, Technometrics.

[13]  Anja Feldmann,et al.  Rate of Change and other Metrics: a Live Study of the World Wide Web , 1997, USENIX Symposium on Internet Technologies and Systems.