A survey of self‐healing systems frameworks

Rising complexity within multi‐tier computing architectures remains an open problem. As complexity increases, so do the costs associated with operating and maintaining systems within these environments. One approach for addressing these problems is to build self‐healing systems (i.e. frameworks) that can autonomously detect and recover from faulty states. Self‐healing systems often combine machine learning techniques with closed control loops to reduce the number of situations requiring human intervention. This is particularly useful in situations where human involvement is both costly to develop, and a source of potential faults. Therefore, a survey of self‐healing frameworks and methodologies in multi‐tier architectures is provided to the reader. Uniquely, this study combines an overview of the state of the art with a comparative analysis of the computing environment, degree of behavioural autonomy, and organisational requirements of these approaches. Highlighting these aspects provides for an understanding of the different situational benefits of these self‐healing systems. We conclude with a discussion of potential and current research directions within this field. Copyright © 2014 John Wiley & Sons, Ltd.

[1]  Klaus Pohl,et al.  Accurate Proactive Adaptation of Service-Oriented Systems , 2013, Assurances for Self-Adaptive Systems.

[2]  Hector Garcia-Molina,et al.  The Eigentrust algorithm for reputation management in P2P networks , 2003, WWW '03.

[3]  Yuan-Shun Dai,et al.  Self-healing and Hybrid Diagnosis in Cloud Computing , 2009, CloudCom.

[4]  Jeffrey O. Kephart Autonomic computing: the first decade , 2011, ICAC '11.

[5]  Sheng Ma,et al.  Quickly Finding Known Software Problems via Automated Symptom Matching , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[6]  George Candea,et al.  Microreboot - A Technique for Cheap Recovery , 2004, OSDI.

[7]  Heiko Schuldt,et al.  OSIRIS-SR: A Safety Ring for self-healing distributed composite service execution , 2012, 2012 7th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS).

[8]  Eitan Altman,et al.  Bio-Inspired Approaches for Autonomic Pervasive Computing Systems , 2008, BIOWIRE.

[9]  Rogério de Lemos,et al.  The Conflict Between Self-* Capabilities and Predictability , 2005, Self-star Properties in Complex Information Systems.

[10]  Radu Calinescu,et al.  General-Purpose Autonomic Computing , 2009, Autonomic Computing and Networking.

[11]  David E. Irwin,et al.  Balancing risk and reward in a market-based task service , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[12]  Marco Danelutto,et al.  Advances in Autonomic Components & Services , 2008 .

[13]  Onn Shehory,et al.  A Self-healing Approach to Designing and Deploying Complex, Distributed and Concurrent Software Systems , 2006, PROMAS.

[14]  Louis Rilling,et al.  Vigne: Towards a Self-healing Grid Operating System , 2006, Euro-Par.

[15]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[16]  Xiaohui Gu,et al.  UBL: unsupervised behavior learning for predicting performance anomalies in virtualized cloud systems , 2012, ICAC '12.

[17]  Amin Vahdat,et al.  Opus: an overlay peer utility service , 2002, 2002 IEEE Open Architectures and Network Programming Proceedings. OPENARCH 2002 (Cat. No.02EX571).

[18]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[19]  Alvaro A. A. Fernandes,et al.  An Approach to Ad hoc Cloud Computing , 2010, ArXiv.

[20]  Carlo Ghezzi,et al.  Assurances for Self-Adaptive Systems , 2013, Lecture Notes in Computer Science.

[21]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[22]  Vincenzo Grassi,et al.  MOSES: A Framework for QoS Driven Runtime Adaptation of Service-Oriented Systems , 2012, IEEE Transactions on Software Engineering.

[23]  Vibhore Kumar,et al.  Work in Progress: Availability-Aware Self-Configuration in Autonomic Systems , 2004, DSOM.

[24]  Alan Dearle,et al.  Autonomic Management of Maintenance Scheduling in Chord , 2010, ArXiv.

[25]  Andrew M. Tyrrell,et al.  Embryonics: A Bio-Inspired Cellular Architecture with Fault-Tolerant Properties , 2000, Genetic Programming and Evolvable Machines.

[26]  Barbara Pernici Self-healing Systems and Web Services: The WS-Diamond Approach , 2008, Business Process Management Workshops.

[27]  Julie A. McCann,et al.  Can self-managed systems be trusted? Some views and trends , 2006, Knowl. Eng. Rev..

[28]  Gianluca Tempesti,et al.  Embryonics: Electronic Stem Cells , 2002 .

[29]  Daniele Miorandi,et al.  Embryonic Models for Self-healing Distributed Services , 2009, BIONETICS.

[30]  David McSherry,et al.  Autonomic self healing and recovery informed by environment knowledge , 2006, Artificial Intelligence Review.

[31]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[32]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[33]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[34]  Bradley R. Schmerl,et al.  Architecture-based self-adaptation in the presence of multiple objectives , 2006, SEAMS '06.

[35]  Franco Zambonelli,et al.  A survey of autonomic communications , 2006, TAAS.

[36]  Opher Etzion,et al.  Towards proactive event-driven computing , 2011, DEBS '11.

[37]  Myra B. Cohen,et al.  Failure Avoidance in Configurable Systems through Feature Locality , 2013, Assurances for Self-Adaptive Systems.

[38]  Rune Gustavsson,et al.  Self-healing and Resilient Critical Infrastructures , 2008, CRITIS.

[39]  Alessandra Gorla,et al.  Healing Web applications through automatic workarounds , 2008, International Journal on Software Tools for Technology Transfer.

[40]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[41]  Gail E. Kaiser,et al.  Self-managing systems: a control theory foundation , 2005, 12th IEEE International Conference and Workshops on the Engineering of Computer-Based Systems (ECBS'05).

[42]  Debanjan Ghosh,et al.  Self-healing systems - survey and synthesis , 2007, Decis. Support Syst..

[43]  Zhiling Lan,et al.  3-Dimensional root cause diagnosis via co-analysis , 2012, ICAC '12.

[44]  Petr Jan Horn,et al.  Autonomic Computing: IBM's Perspective on the State of Information Technology , 2001 .

[45]  Rajarshi Das,et al.  Utility functions in autonomic systems , 2004, International Conference on Autonomic Computing, 2004. Proceedings..

[46]  Morris Sloman,et al.  Policy driven management for distributed systems , 1994, Journal of Network and Systems Management.

[47]  Heiko Schuldt,et al.  Scalable peer-to-peer process management - the OSIRIS approach , 2004, Proceedings. IEEE International Conference on Web Services, 2004..

[48]  David B. Knoester,et al.  Plato: a genetic algorithm approach to run-time reconfiguration in autonomic computing systems , 2011, Cluster Computing.

[49]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[50]  Gabi Dreo Rodosek,et al.  Self-Healing Systems: Foundations and Challenges , 2009, Self-Healing and Self-Adaptive Systems.

[51]  Jeffrey O. Kephart,et al.  Research challenges of autonomic computing , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[52]  Jeff Magee,et al.  Self-Managed Systems: an Architectural Challenge , 2007, Future of Software Engineering (FOSE '07).

[53]  Julie A. McCann,et al.  Evaluation Issues in Autonomic Computing , 2004, GCC Workshops.

[54]  Sheikh Iqbal Ahamed,et al.  Self-healing for autonomic pervasive computing , 2007, SAC '07.

[55]  David M Levinson,et al.  Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering , 2009, Complex.

[56]  David M. Chess Security in autonomic computing , 2005, CARN.

[57]  Jeffrey O. Kephart,et al.  An artificial intelligence perspective on autonomic computing policies , 2004, Proceedings. Fifth IEEE International Workshop on Policies for Distributed Systems and Networks, 2004. POLICY 2004..

[58]  Jocelyn Simmonds,et al.  Monitoring and recovery for web service applications , 2012, Computing.

[59]  David B. Knoester,et al.  Applying genetic algorithms to decision making in autonomic computing systems , 2009, ICAC '09.

[60]  Sam Malek,et al.  SASSY: A Framework for Self-Architecting Service-Oriented Systems , 2011, IEEE Software.

[61]  Craig Boutilier,et al.  Cooperative Negotiation in Autonomic Systems using Incremental Utility Elicitation , 2002, UAI.

[62]  Satoshi Matsuoka,et al.  A Decentralized, Scalable, and Autonomous Grid Monitoring System , 2007, OPODIS.

[63]  Aaron B. Brown,et al.  Measuring the Effectiveness of Self-Healing Autonomic Systems , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[64]  Gerald C. Gannod,et al.  A Self-healing Web Server Using Differentiated Services , 2006, ICSOC.

[65]  David Garlan,et al.  Rainbow: architecture-based self-adaptation with reusable infrastructure , 2004 .

[66]  Lejian Liao,et al.  A Self-healing Framework for QoS-Aware Web Service Composition via Case-Based Reasoning , 2013, APWeb.

[67]  Wlodzimierz Funika,et al.  A Role-Based Approach to Self-healing in Autonomous Monitoring Systems , 2009, PPAM.

[68]  Mirko Viroli,et al.  Gradient-Based Self-Organisation Patterns of Anticipative Adaptation , 2012, 2012 IEEE Sixth International Conference on Self-Adaptive and Self-Organizing Systems.

[69]  Thomas A. Corbi,et al.  The dawning of the autonomic computing era , 2003, IBM Syst. J..

[70]  Ladan Tahvildari,et al.  Self-adaptive software: Landscape and research challenges , 2009, TAAS.

[71]  Bradley R. Schmerl,et al.  Rainbow: architecture-based self-adaptation with reusable infrastructure , 2004, International Conference on Autonomic Computing, 2004. Proceedings..

[72]  Schahram Dustdar,et al.  A survey on self-healing systems: approaches and systems , 2010, Computing.

[73]  Marsha Chechik,et al.  Monitoring and Recovery of Web Service Applications , 2010, The Smart Internet.

[74]  Jose Luis Fernandez-Marquez,et al.  Augmenting the Repertoire of Design Patterns for Self-Organized Software by Reverse Engineering a Bio-Inspired P2P System , 2012, 2012 IEEE Sixth International Conference on Self-Adaptive and Self-Organizing Systems.

[75]  Rogério de Lemos,et al.  Proceedings of the 7th International Symposium on Software Engineering for Adaptive and Self-Managing Systems , 2012, ICSE 2012.

[76]  Sara Montagna,et al.  BIO-CORE: Bio-inspired Self-organising Mechanisms Core , 2011, BIONETICS.

[77]  Schahram Dustdar,et al.  Behavior Monitoring in Self-Healing Service-Oriented Systems , 2010, 2010 IEEE 34th Annual Computer Software and Applications Conference.