MPICH-GQ: Quality-of-Service for Message Passing Programs

Parallel programmers typically assume that all resources required for a program’s execution are dedicated to that purpose. However, in local and wide area networks, contention for shared networks, CPUs, and I/O systems can result in significant variations in availability, with consequent adverse effects on overall performance. We describe a new message-passing architecture, MPICH-GQ, that uses quality of service (QoS) mechanisms to manage contention and hence improve performance of message passing interface (MPI) applications. MPICH-GQ combines new QoS specification, traffic shaping, QoS reservation, and QoS implementation techniques to deliver QoS capabilities to the high-bandwidth bursty flows, complex structures, and reliable protocols used in high-performance applications-characteristics very different from the low-bandwidth, constant bit-rate media flows and unreliable protocols for which QoS mechanisms were designed. Results obtained on a differentiated services testbed demonstrate our ability to maintain application performance in the face of heavy network contention.

[1]  Klara Nahrstedt,et al.  A distributed resource management architecture that supports advance reservations and co-allocation , 1999, 1999 Seventh International Workshop on Quality of Service. IWQoS'99. (Cat. No.98EX354).

[2]  Ian Foster,et al.  A quality of service architecture that combines resource reservation and application adaptation , 2000, 2000 Eighth International Workshop on Quality of Service. IWQoS 2000 (Cat. No.00EX400).

[3]  Klara Nahrstedt,et al.  QoS-aware resource management for distributed multimedia applications^{1} , 1998, J. High Speed Networks.

[4]  William Gropp,et al.  Mpi - The Complete Reference: Volume 2, the Mpi Extensions , 1998 .

[5]  Douglas C. Schmidt,et al.  Evaluating policies and mechanisms for supporting embedded, real-time applications with CORBA 3.0 , 2000, Proceedings Sixth IEEE Real-Time Technology and Applications Symposium. RTAS 2000.

[6]  Thomas Eickermann,et al.  Performance Issues of Distributed MPI Applications in a German Gigabit Testbed , 1999, PVM/MPI.

[7]  Scott Shenker,et al.  Integrated Services in the Internet Architecture : an Overview Status of this Memo , 1994 .

[8]  Olov Schelén,et al.  Advance reservations for predictive service in the Internet , 1997, Multimedia Systems.

[9]  Matthew Mathis,et al.  The macroscopic behavior of the TCP congestion avoidance algorithm , 1997, CCRV.

[10]  Douglas Comer,et al.  Internetworking with TCP/IP , 1988 .

[11]  Anthony Skjellum,et al.  Using MPI - portable parallel programming with the message-parsing interface , 1994 .

[12]  Zheng Wang,et al.  An Architecture for Differentiated Services , 1998, RFC.

[13]  T. V. Lakshman,et al.  Window-based error recovery and flow control with a slow acknowledgement channel: a study of TCP/IP performance , 1997, Proceedings of INFOCOM '97.

[14]  Klara Nahrstedt,et al.  CPU service classes for multimedia applications , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[15]  Henning Schulzrinne,et al.  Network quality of service , 1998 .

[16]  William Gropp,et al.  MPI: The Complete Reference , Vol. 2 - The MPI-2 Extensions , 1998 .

[17]  Jason Lee,et al.  Distributed parallel data storage systems: a scalable approach to high speed image servers , 1994, MULTIMEDIA '94.

[18]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[19]  Bronis R. de Supinski,et al.  Exploiting hierarchy in parallel computer networks to optimize collective operation performance , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[20]  Henri E. Bal,et al.  MagPIe: MPI's collective communication operations for clustered wide area systems , 1999, PPoPP '99.

[21]  William Gropp,et al.  Mpi---the complete reference: volume 1 , 1998 .

[22]  Ian T. Foster,et al.  A Grid-Enabled MPI: Message Passing in Heterogeneous Distributed Computing Systems , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[23]  David L. Black,et al.  An Architecture for Differentiated Service , 1998 .

[24]  William E. Johnston,et al.  QoS as Middleware: Bandwidth Brokering System Design , 1998 .

[25]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[26]  Klara Nahrstedt,et al.  Design, Implementation, and Experiences of the OMEGA End-Point Architecture , 1996, IEEE J. Sel. Areas Commun..

[27]  Kang G. Shin,et al.  Structuring communication software for quality-of-service guarantees , 1996, 17th IEEE Real-Time Systems Symposium.

[28]  Ian T. Foster,et al.  A differentiated services implementation for high-performance TCP flows , 2000, Comput. Networks.

[29]  John A. Zinky,et al.  Architectural Support for Quality of Service for CORBA Objects , 1997, Theory Pract. Object Syst..

[30]  Peter Steenkiste,et al.  Darwin: customizable resource management for value-added network services , 1998, Proceedings Sixth International Conference on Network Protocols (Cat. No.98TB100256).

[31]  Michael M. Resch,et al.  Distributed Computing in a Heterogeneous Computing Environment , 1998, PVM/MPI.

[32]  Richard Wolski,et al.  Forecasting network performance to support dynamic scheduling using the network weather service , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[33]  William Gropp,et al.  Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[34]  Lixia Zhang,et al.  Resource ReSerVation Protocol (RSVP) - Version 1 Functional Specification , 1997, RFC.