A Failure Recovery Protocol for Software-Defined Real-Time Networks

In a distributed computing environment, real-time tasks communicate via a network infrastructure whose stability significantly impacts timing predictability. Network stability includes two aspects. First, the network has to guarantee the deadline requirements of real-time message transmissions in the absence of network failures. Second, the network needs to support dynamic recovery when network failures occur. This paper generalizes previous static routing approaches, which address the first aspect of the network stability, by developing a dynamic failure recovery policy and a protocol to address the second aspect of the network stability. We derive new real-time forwarding paths without compromising the capability of network devices to guarantee deadlines of concurrent real-time transmissions. We implement this mechanism on a network simulation platform and evaluate it on real hardware in a local cluster to demonstrate its feasibility and effectiveness. Experiments confirm the ability to bound recovery delays based on the network parameters.

[1]  Robert E. Tarjan,et al.  A quick method for finding shortest pairs of disjoint paths , 1984, Networks.

[2]  Magnus Jonsson,et al.  Switched real-time ethernet with earliest deadline first scheduling protocols and traffic handling , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[3]  Chunming Qiao,et al.  On finding disjoint paths in single and dual link cost networks , 2004, IEEE INFOCOM 2004.

[4]  Roch Guérin,et al.  Efficient network QoS provisioning based on per node traffic shaping , 1996, TNET.

[5]  Parameswaran Ramanathan,et al.  Delivery of time-critical messages using a multiple copy approach , 1992, TOCS.

[6]  Torsten Hoefler,et al.  Multistage switches are not crossbars: Effects of static routing in high-performance networks , 2008, 2008 IEEE International Conference on Cluster Computing.

[7]  Sebastien Lagrange,et al.  Optimal routing for end-to-end guarantees using Network Calculus , 2008, Perform. Evaluation.

[8]  Marco Spuri,et al.  Efficient aperiodic service under earliest deadline scheduling , 1994, 1994 Proceedings Real-Time Systems Symposium.

[9]  J. Loeser,et al.  Low-latency hard real-time communication over switched Ethernet , 2004 .

[10]  Thomas Nolte,et al.  Multi-level hierarchical scheduling in Ethernet switches , 2011, 2011 Proceedings of the Ninth ACM International Conference on Embedded Software (EMSOFT).

[11]  Christo Wilson,et al.  Better never than late , 2011, SIGCOMM 2011.

[12]  Domenico Ferrari,et al.  Rate-Controlled Service Disciplines , 1994, J. High Speed Networks.

[13]  Frank Mueller,et al.  A Linux Real-Time Packet Scheduler for Reliable Static SDN Routing , 2017, ECRTS.

[14]  David L. Mills,et al.  Internet time synchronization: the network time protocol , 1991, IEEE Trans. Commun..

[15]  Dinesh C. Verma,et al.  A Scheme for Real-Time Channel Establishment in Wide-Area Networks , 1990, IEEE J. Sel. Areas Commun..

[16]  Kai Zhu,et al.  Achieving end-to-end delay bounds by EDF scheduling without traffic shaping , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).