A failure detector based on processes’ relevance and the confidence degree in the system for self-healing in ubiquitous environments

Ubiquitous systems have become a common technology in our everyday lives, thus increasing our dependency on them. However, the occurrence of failures in these types of systems can reduce their applicability/usability which may result in some difficult, or even dangerous, consequences. Such systems must, therefore, possess self-healing capabilities in order to detect failures and make the necessary adjustments to prevent their impact on applications. In such a context, this work proposes a new and flexible unreliable failure detector (FD), denoted as the Impact failure detector, which is based on process relevance and the confidence degree in the system, for self-healing systems in ubiquitous environments. The output of the Impact FD concerns the confidence in the system as a whole. By expressing the relevance of each node by an impact factor value as well as a margin of acceptable failures of the system, the Impact FD enables the user to tune the failure detection configuration in accordance with the requirements of the application. The performance evaluation results confirm the degree of flexible applicability of our FD and, that due to the margin of failure, the number of false responses may be reduced when it is compared with traditional unreliable FDs.

[1]  M. Yano,et al.  A Proposal of Forwarding Method for Urgent Messages on an Ubiquitous Wireless Sensor Network , 2005, 6th Asia-Pacific Symposium on Information and Telecommunication Technologies.

[2]  Debanjan Ghosh,et al.  Self-healing systems - survey and synthesis , 2007, Decis. Support Syst..

[3]  Schahram Dustdar,et al.  A survey on self-healing systems: approaches and systems , 2010, Computing.

[4]  Yue Zhang,et al.  A Novel Self-Adaptive Fault-Tolerant Mechanism and Its Application for a Dynamic Pervasive Computing Environment , 2012, 2012 IEEE 15th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops.

[5]  Rachid Guerraoui,et al.  The weakest failure detectors to solve certain fundamental problems in distributed computing , 2004, PODC '04.

[6]  Rachid Guerraoui,et al.  Shared Memory vs Message Passing , 2003 .

[7]  Pierre Sens,et al.  Impact: an Unreliable Failure Detector Based on Processes' Relevance and the Confidence Degree in the System , 2016 .

[8]  Sheikh Iqbal Ahamed,et al.  MARKS (Middleware Adaptability for Resource Discovery, Knowledge Usability and Self-healing) for Mobile Devices of Pervasive Computing Environments , 2006, Third International Conference on Information Technology: New Generations (ITNG'06).

[9]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[10]  Noman Islam,et al.  A review of wireless sensors and networks' applications in agriculture , 2014, Comput. Stand. Interfaces.

[11]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[12]  V. Jacobson,et al.  Congestion avoidance and control , 1988, CCRV.

[13]  Marcos K. Aguilera,et al.  On the quality of service of failure detectors based on control theory , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[14]  Aboul Ella Hassanien,et al.  An Overview of Self-Protection and Self-Healing in Wireless Sensor Networks , 2014, Bio-inspiring Cyber Security and Cloud Services.

[15]  Emil C. Lupu,et al.  Self-healing for Pervasive Computing Systems , 2009, WADS.

[16]  Naohiro Hayashibara,et al.  The φ Accrual Failure Detector , 2004 .

[17]  Ciprian Dobre,et al.  A Failure Detection System for Large Scale Distributed Systems , 2010, CISIS.

[18]  Marcos K. Aguilera,et al.  Improving Availability in Distributed Systems with Failure Informers , 2013, NSDI.

[19]  Richard Murch,et al.  Autonomic Computing , 2004 .

[20]  Pierre Sens,et al.  Performance analysis of a hierarchical failure detector , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[21]  Anand Ranganathan,et al.  Towards fault tolerance pervasive computing , 2005, IEEE Technology and Society Magazine.

[22]  Fatos Xhafa,et al.  A Software Chain Approach to Big Data Stream Processing and Analytics , 2015, 2015 Ninth International Conference on Complex, Intelligent, and Software Intensive Systems.

[23]  Yuriy Brun,et al.  Smart Redundancy for Distributed Computation , 2011, 2011 31st International Conference on Distributed Computing Systems.

[24]  Pierre Sens,et al.  Unreliable Failure Detectors for Mobile Ad-hoc Networks , 2011 .

[25]  Thomas A. Corbi,et al.  The dawning of the autonomic computing era , 2003, IBM Syst. J..

[26]  Luciana Arantes,et al.  An Architecture for Resilient Ubiquitous Systems , 2014, HEALTHINF.

[27]  Mark Weiser The computer for the 21st century , 1991 .

[28]  Benjamin Satzger,et al.  A new adaptive accrual failure detector for dependable distributed systems , 2007, SAC '07.

[29]  Rajashekhar C. Biradar,et al.  Fault tolerance in wireless sensor network using hand-off and dynamic power adjustment approach , 2013, J. Netw. Comput. Appl..

[30]  Achour Mostéfaoui,et al.  Asynchronous implementation of failure detectors , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..