Implementing TreadMarks over GM on Myrinet: challenges, design experience, and performance evaluation

Software based DSM systems like TreadMarks have traditionally not performed well compared to message passing applications because of the high overhead of communication associated with traditional stack based protocols like UDP. Modern interconnects like Myrinet offer reliable message delivery with very low communication overhead through user level protocols. This paper examines the viability of implementing a thin communication substrate between TreadMarks and Myrinet GM, the rationale being that a layer tuned to the needs of the application would offer better performance and scalability as opposed to a generic UDP layer. Trade-offs for various design alternatives for buffer management, connection setup, advance posting of descriptors and asynchronous messages are discussed. We have implemented the best of these strategies in a layer that is bound to TreadMarks at compile time. Results from micro-benchmarks and applications show that not only does the specialized implementation perform better, it also exhibits better parallel speedup and scalability. A reduction in total application execution time of up to a factor of 6.3 for a 16 node system is demonstrated in comparison with the original implementation. The implementation also exhibits superior scaling properties as the application size is increased.

[1]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[2]  Michael L. Scott,et al.  The effect of network total order, broadcast, and remote-write capability on network-based shared memory computing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[3]  Assaf Schuster,et al.  Harnessing The Power of Fast, Low Latency, Networks for Software DSMs , 1999 .

[4]  Hugh Garraway Parallel Computer Architecture: A Hardware/Software Approach , 1999, IEEE Concurrency.

[5]  Brian Vinter,et al.  Comparing the Performance of the PastSet Distributed Shared Memory System using TCP / IP and M-VIA , 2000 .

[6]  Scott Pakin,et al.  High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[7]  Bernard Tourancheau,et al.  BIP: A New Protocol Designed for High Performance Networking on Myrinet , 1998, IPPS/SPDP Workshops.

[8]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[9]  Dhabaleswar K. Panda,et al.  Implementing TreadMarksover VIA on Myrinet and Gigabit Ethernet: Challenges, Design Experience, and Performance Evaluation , 2001 .

[10]  Howard Frazier,et al.  Gigabit Ethernet: From 100 to 1000 Mbps , 1999, IEEE Internet Comput..

[11]  P. Sadayappan,et al.  Implementing TreadMarks over Virtual Interface Architecture on Myrinet and gigabit Ethernet: Challenges, design experience, and performance evaluation , 2001, International Conference on Parallel Processing, 2001..

[12]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[13]  Rich Seifert Gigabit Ethernet , 2001, LCN.

[14]  Thorsten von Eicken,et al.  Incorporating Memory Management into User-Level Network Interfaces , 1997 .

[15]  Robert J. Harrison,et al.  Performance and experience with LAPI-a new high-performance communication library for the IBM RS/6000 SP , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[16]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.