An Event-driven Architecture for MPI Libraries

Existing MPI libraries couple the progress of message transmission or reception with library invocations by the user application. Such coupling allows for simplicity of implementation, but may increase communication latency and waste CPU resources. This paper proposes the addition of an event-driven communication thread to make messaging progress in the library separately from the application thread, thus decoupling communication progress from library invocations by the application. The asynchronous event-thread allows messages to be sent and received concurrently with application execution. This technique dramatically improves the responsiveness of the library to net work communication. Microbenchmark results show that the time spent waiting for non-blocking receives to complet e can be significantly reduced or even eliminated entirely. Ap plication performance as measured by the NAS benchmarks shows an average of 4.5% performance improvement, with a peak improvement of 9.2%.

[1]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[2]  Larry L. Peterson,et al.  Fbufs: a high-bandwidth cross-domain transfer facility , 1994, SOSP '93.

[3]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[4]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[5]  Peter B. Danzig,et al.  The Harvest Information Discovery and Access System , 1995, Comput. Networks ISDN Syst..

[6]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[7]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[8]  David E. Culler,et al.  High-performance local area communication with fast sockets , 1997 .

[9]  Dean M. Tullsen,et al.  Simultaneous multithreading: a platform for next-generation processors , 1997, IEEE Micro.

[10]  Erich M. Nahum,et al.  Performance issues in WWW servers , 1999, SIGMETRICS '99.

[11]  Jeffrey S. Chase,et al.  Trapeze / IP : TCP / IP at Near-Gigabit Speeds , 1999 .

[12]  Willy Zwaenepoel,et al.  Flash: An efficient and portable Web server , 1999, USENIX Annual Technical Conference, General Track.

[13]  Yitzhak Birk,et al.  Deferred segmentation for wire-speed transmission of large TCP frames over standard GbE networks , 2001, HOT 9 Interconnects. Symposium on High Performance Interconnects.

[14]  Jonathan Lemon Kqueue - A Generic and Scalable Event Notification Facility , 2001, USENIX Annual Technical Conference, FREENIX Track.

[15]  Ronald Minnich,et al.  A network-failure-tolerant message-passing system for terascale clusters , 2002, ICS '02.

[16]  Wu-chun Feng,et al.  The Quadrics Network: High-Performance Clustering Technology , 2002, IEEE Micro.

[17]  Greg Burns,et al.  LAM: An Open Cluster Environment for MPI , 2002 .

[18]  Arthur B. Maccabe,et al.  Making TCP Viable as a High Performance Computing Protocol , 2002 .

[19]  Sriram R. Vangal,et al.  A TCP offload accelerator for 10 Gb/s Ethernet in 90-nm CMOS , 2003 .

[20]  Andrew Lumsdaine,et al.  A Component Architecture for LAM/MPI , 2003, PVM/MPI.

[21]  Tim Brecht,et al.  Comparing and Evaluating epoll, select, and poll Event Mechanisms , 2004 .