A new approach to provide real-time services on high-speed local area networks

In the past few years, networks of workstations (NOWs) and clusters, based on high-speed local area networks (LANs), have emerged as a serious alternative to supercomputers and high-performance servers. Meanwhile, applications demanding real-time network services have also suffered a substantial growth. In order to use NOWs for distributed real-time processing, a topology change and faulttolerant mechanism that guarantees the maximum latency or the minimum bandwidth in the worst case must be provided. Up to now, the backup channel protocol (BCP), based on real-time channels, provides fault-tolerant realtime services. But in this approach, fault tolerance is limited by the alternative paths provided by the routing function to establish the backup channels and topology change tolerance is not supported. On the other hand, dynamic reconfiguration updates the routing tables without stopping user traffic when a topology change or fault occurs. However, dynamic reconfiguration by itself does not provide neither quality of service nor real-time services, but it provides support for an additional mechanism designed to meet realtime requirements. In this paper, we propose a new hardware-supported protocol to provide topology change and fault-tolerant real-time services on NOWs. The novelty of our proposal primarily relies on the ability to assimilate hot topology changes and faults while still providing real-time services through backup channels and dynamic reconfiguration. Our protocol increases fault tolerance beyond the level provided by the backup channel protocol so that fault tolerance is only limited by topology connectivity. Furthermore, to our knowledge, our protocol is the only mechanism which is able to assimilate hot updates without stopping neither realtime traffic nor normal network operation.

[1]  Antonio Robles,et al.  A Comparison of Router Architectures for Virtual Cut-Through and Wormhole Switching in a NOW Environment , 2001, J. Parallel Distributed Comput..

[2]  Joe Pelissier,et al.  Providing Quality of Service over InfiniBandTM Architecture Fabrics , 2000 .

[3]  José Duato,et al.  The double scheme: deadlock-free dynamic reconfiguration of cut-through networks , 2000, Proceedings 2000 International Conference on Parallel Processing.

[4]  José Duato,et al.  Fast dynamic reconfiguration in irregular networks , 2000, Proceedings 2000 International Conference on Parallel Processing.

[5]  José Duato,et al.  Performance evaluation of dynamic reconfiguration in high-speed local area networks , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[6]  Federico Silla,et al.  Improving the efficiency of adaptive routing in networks with irregular topology , 1997, Proceedings Fourth International Conference on High-Performance Computing.

[7]  Michael D. Schroeder,et al.  Automatic reconfiguration in Autonet , 1991, SOSP '91.

[8]  José Duato,et al.  Extending Dynamic Reconfiguration to NOWs with Adaptive Routing , 2000, CANPC.

[9]  Kang G. Shin,et al.  Real-Time Communication in Multihop Networks , 1994, IEEE Trans. Parallel Distributed Syst..

[10]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[11]  Kang G. Shin,et al.  A Primary-Backup Channel Approach to Dependable Real-Time Communication in Multihop Networks , 1998, IEEE Trans. Computers.

[12]  Farnam Jahanian,et al.  Experimental study of Internet stability and backbone failures , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).