FM-QoS: Real-time Communication using Self-synchronizing Schedules

FM-QoS employs a novel communication architecture based on network feedback to provide predictable communication performance (e.g. deterministic latencies and guaranteed bandwidths) for high speed cluster interconnects. Network feedback is combined with self-synchronizing communication schedules to achieve synchrony in the network interfaces (NIs). Based on this synchrony, the network can be scheduled to provide predictable performance without special network QoS hardware. We describe the key element of the FM-QoS approach, feedback-based synchronization (FBS), which exploits network feedback to synchronize senders. We use Petri nets to characterize the set of self-synchronizing communication schedules for which FBS is effective and to describe the resulting synchronization overhead as a function of the clock drift across the network nodes. Analytic modeling suggests that for clocks of quality 300 ppm (such as found in the Myrinet NI), a synchronization overhead less than 1% of the total communication traffic is achievable -- significantly better than previous software-based schemes and comparable to hardware-intensive approaches such as virtual circuits (e.g. ATM). We have built a prototype of FBS for Myricom s Myrinet network (a 1.28 Gbps cluster network) which demonstrates the viability of the approach by sharing network resources with predictable performance. The prototype, which implements the local node schedule in software, achieves predictable latencies of 23 µs for a single-switch, 8-node network and 2 KB packets. In comparison, the best-effort scheme achieves 104 µs for the same network without FBS. While this ratio of over four to one already demonstrates the viability of the approach, it includes nearly 10 µs of overhead due to the software implementation. For hardware implementations of local node scheduling, and for networks with cascaded switches, these ratios should be much larger factors.

[1]  Matt W. Mutka,et al.  Priority based real-time communication for large scale wormhole networks , 1994, Proceedings of 8th International Parallel Processing Symposium.

[2]  Andrew A. Chien,et al.  A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[3]  Andrew A. ChienJanuary Fast Messages ( FM ) : E cient , Portable Communication for Workstation Clusters and Massively-Parallel Processors , 1997 .

[4]  D.C. Verma,et al.  Delay jitter control for real-time communication in a packet switching network , 1991, Proceedings of TRICOMM `91: IEEE Conference on Communications Software: Communications for Distributed Applications and Systems.

[5]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[6]  Scott Shenker,et al.  Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM '89.

[7]  Lixia Zhang,et al.  VirtualClock: a new traffic control algorithm for packet-switched networks , 1991, TOCS.

[8]  Kang G. Shin,et al.  Real-Time Communication in Multihop Networks , 1994, IEEE Trans. Parallel Distributed Syst..

[9]  Srinivasan Keshav,et al.  Rate controlled servers for very high-speed networks , 1990, [Proceedings] GLOBECOM '90: IEEE Global Telecommunications Conference and Exhibition.

[10]  S. Jamaloddin Golestani,et al.  Congestion-free communication in high-speed packet networks , 1991, IEEE Trans. Commun..

[11]  Scott Shenker,et al.  Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM 1989.

[12]  Parameswaran Ramanathan,et al.  Hardware-Assisted Software Clock Synchronization for Homogeneous Distributed Systems , 1990, IEEE Trans. Computers.

[13]  Andrew A. Chien,et al.  Bandwidth and latency guarantees in low-cost, high-performance networks , 1997 .

[14]  Scott Hauck,et al.  Asynchronous design methodologies: an overview , 1995, Proc. IEEE.

[15]  Edward W. Knightly,et al.  Fundamental limits and tradeoffs of providing deterministic guarantees to VBR video traffic , 1995, SIGMETRICS '95/PERFORMANCE '95.

[16]  Mario Gerla,et al.  Quality of service support in high-speed, wormhole routing networks , 1996, Proceedings of 1996 International Conference on Network Protocols (ICNP-96).

[17]  Mario Lauria,et al.  MPI-FM: High Performance MPI on Workstation Clusters , 1997, J. Parallel Distributed Comput..

[18]  D. Estrin,et al.  RSVP: a new resource reservation protocol , 1993, IEEE Communications Magazine.

[19]  Didier Le Gall,et al.  MPEG: a video compression standard for multimedia applications , 1991, CACM.

[20]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[21]  Jean-Yves Le Boudec,et al.  The Asynchronous Transfer Mode: A Tutorial , 1992, Comput. Networks ISDN Syst..

[22]  Robert W. Horst TNet: A Reliable System Area Network , 1995, IEEE Micro.

[23]  Alok N. Choudhary,et al.  Designing and implementing high-performance media-on-demand servers , 1995, IEEE Parallel Distributed Technol. Syst. Appl..

[24]  J.H. Kim,et al.  Rotating Combined Queueing (RCQ): Bandwidth and Latency Guarantees in Low-Cost, High-Performance Networks , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[25]  P. Newman ATM local area networks , 1994, IEEE Communications Magazine.

[26]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.