Toward Self-Healing Multitier Services

Are self-heating database-centric multitier services Utopia or just a hard puzzle? We argue for the latter and aim to identify the missing pieces of this puzzle. We advocate robust and scalable learning-based approaches to self-healing that we expect to work well for a large class of multitier services. We identify performance-availability problems (PAPs) as the most relevant target for self-healing, and argue that PAPs are best addressed macroscopically. outside the realm of individual tiers. Finally, we lay out a research agenda for learning-based approaches to self-healing, to enable wider deployment of self-healing multi-tier services.

[1]  J. Chase,et al.  Active Sampling for Accelerated Learning of Performance Models , 2006 .

[2]  Kishor S. Trivedi,et al.  A comprehensive model for software rejuvenation , 2005, IEEE Transactions on Dependable and Secure Computing.

[3]  Armando Fox,et al.  Ensembles of models for automated diagnosis of system performance problems , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[4]  Jeffrey S. Chase,et al.  Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control , 2004, OSDI.

[5]  Archana Ganapathi,et al.  Why Do Internet Services Fail, and What Can Be Done About It? , 2002, USENIX Symposium on Internet Technologies and Systems.

[6]  Jeffrey S. Chase,et al.  Active and accelerated learning of cost models for optimizing scientific applications , 2006, VLDB.

[7]  George Candea,et al.  Combining Visualization and Statistical Analysis to Improve Operator Confidence and Efficiency for Failure Detection and Localization , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[8]  Ian Witten,et al.  Data Mining , 2000 .

[9]  George Candea,et al.  Microreboot - A Technique for Cheap Recovery , 2004, OSDI.

[10]  Sam Lightstone,et al.  Adaptive self-tuning memory in DB2 , 2006, VLDB.

[11]  Petr Jan Horn,et al.  Autonomic Computing: IBM's Perspective on the State of Information Technology , 2001 .

[12]  Armando Fox,et al.  Capturing, indexing, clustering, and retrieving system history , 2005, SOSP '05.

[13]  Peter J. Haas,et al.  Automated Statistics Collection in DB2 UDB , 2004, VLDB.

[14]  David A. Patterson,et al.  Path-Based Failure and Evolution Management , 2004, NSDI.

[15]  Yixin Diao,et al.  Feedback Control of Computing Systems , 2004 .

[16]  George Candea,et al.  Autonomous recovery in componentized Internet applications , 2006, Cluster Computing.

[17]  Alan L. Cox,et al.  Bottleneck Characterization of Dynamic Web Site Benchmarks , 2002 .

[18]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[19]  George Candea,et al.  Automatic failure-path inference: a generic introspection technique for Internet applications , 2003, Proceedings the Third IEEE Workshop on Internet Applications. WIAPP 2003.

[20]  Thu D. Nguyen,et al.  Falling Off the Cliff: When Systems Go Nonlinear , 2005, HotOS.

[21]  J. Shaoul Human Error , 1973, Nature.

[22]  Prashant J. Shenoy,et al.  Dynamic Provisioning of Multi-tier Internet Applications , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[23]  Graham Wood,et al.  Automatic Performance Diagnosis and Tuning in Oracle , 2005, CIDR.