A novel CFN-Watchdog protocol for edge computing

Abstract Compute first networking (CFN) is a latest distributed framework that intelligently allocates computing resources for edge computing according to computing load and network status. It requires real-time visibility of available statuses of local or remote computing resources. To the best of our knowledge, this paper is the first to propose a centralized fault-detection protocol called CFN-Watchdog to well meet this CFN requirement and timely recycle resources occupied by faults. We then theoretically analyze the impact of various parameters (e.g., detection thresholds, task processing time, and network delay) on the Watchdog performance. Extensive simulations verify the effectiveness of our proposed protocol and the accuracy of our theoretical model. This study is very helpful to optimize parameter configurations and better design fault-detection protocols for edge computing.

[1]  Maria Fazio,et al.  A Watchdog Service Making Container-Based Micro-services Reliable in IoT Clouds , 2017, 2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud).

[2]  Ioannis Chatzigiannakis,et al.  Fog-Computing-Based Heartbeat Detection and Arrhythmia Classification Using Machine Learning , 2019, Algorithms.

[3]  Stephen E. Deering,et al.  ICMP Router Discovery Messages , 1991, RFC.

[4]  Xiaoyong Tang,et al.  Operation and Security Considerations of Federated Learning Platform Based on Compute First Network , 2020, 2020 IEEE/CIC International Conference on Communications in China (ICCC Workshops).

[5]  Guoliang Xue,et al.  An Approach to QoS-based Task Distribution in Edge Computing Networks for IoT Applications , 2017, 2017 IEEE International Conference on Edge Computing (EDGE).

[6]  Bruce S. Davie,et al.  Computer Networks: A Systems Approach , 1996 .

[7]  Junghee Lee,et al.  A Hardware-assisted Heartbeat Mechanism for Fault Identification in Large-scale IoT Systems , 2020 .

[8]  Benjamin Teitelbaum,et al.  A One-way Active Measurement Protocol (OWAMP) , 2006, RFC.

[9]  James Brusey,et al.  Heartbeat design for energy-aware IoT: Are your sensors alive? , 2019, Expert Syst. Appl..

[10]  Michal Król,et al.  Compute First Networking: Distributed Computing meets ICN , 2019, ICN.

[11]  Zibouda Aliouat,et al.  Acceptance Test for Fault Detection in Component-based Cloud Computing and Systems , 2017, Future Gener. Comput. Syst..

[12]  Li Yizhou,et al.  Framework of Compute First Networking (CFN) , 2019 .

[13]  Hao Zhu,et al.  Adaptive Failure Detection via Heartbeat under Hadoop , 2011, 2011 IEEE Asia-Pacific Services Computing Conference.

[14]  Deron Liang,et al.  High-Availability Computing Platform with Sensor Fault Resilience , 2021, Sensors.

[15]  Nirwan Ansari,et al.  EdgeIoT: Mobile Edge Computing for the Internet of Things , 2016, IEEE Communications Magazine.

[16]  Sofiène Tahar,et al.  ATLAS: An AdapTive faiLure-Aware Scheduler for Hadoop , 2015, 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC).

[17]  Awadhesh Kumar Singh,et al.  Failure detectors for crash faults in cloud , 2018, J. Ambient Intell. Humaniz. Comput..

[18]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[19]  Karine Amis,et al.  Adaptive Negotiation for Block Acknowledgment Session Management , 2019, 2019 IEEE 89th Vehicular Technology Conference (VTC2019-Spring).

[20]  Dave Katz,et al.  Bidirectional Forwarding Detection (BFD) , 2010, RFC.

[21]  Peter Kilpatrick,et al.  Challenges and Opportunities in Edge Computing , 2016, 2016 IEEE International Conference on Smart Cloud (SmartCloud).

[22]  Gerald Kotonya,et al.  A Microservices Architecture for Reactive and Proactive Fault Tolerance in IoT Systems , 2018, 2018 IEEE 19th International Symposium on "A World of Wireless, Mobile and Multimedia Networks" (WoWMoM).

[23]  Gabriel Antoniu,et al.  Chronos: Failure-aware scheduling in shared Hadoop clusters , 2015, 2015 IEEE International Conference on Big Data (Big Data).