Reducing Diff Overhead in Software DSM Systems using RDMA Operations in InfiniBand

Software DSM systems do not perform well because of the combined effects of increase in communication, slow networks and the large overhead associated with processing the coherence protocol. Modern interconnects like Myrinet, Quadrics and InfiniBand offer reliable, low latency (around 5.0 s point-to-point), and high-bandwidth (upto 10.0 Gbps in 4X InfiniBand). These networks also support efficient memorybased communication primitives like RDMA-Read and RDMA-Write. These supports can be leveraged to effectively reduce overhead in a software DSM system. In this paper, we explore techniques for reducing the diff overhead. These techniques are employed in a protocol called PIPE, which uses RDMA-Write. Application level evaluation shows a maximum improvement of upto 35% in parallel speedup.

[1]  Liviu Iftode,et al.  Home-based shared virtual memory , 1998 .

[2]  Assaf Schuster,et al.  Efficient exploitation of kernel access to Infiniband: a software DSM example , 2003, 11th Symposium on High Performance Interconnects, 2003. Proceedings..

[3]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[4]  Dhabaleswar K. Panda,et al.  Designing high performance DSM systems using InfiniBand features , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..

[5]  William Gropp,et al.  NIC-based atomic operations on Myrinet/GM , 2002 .

[6]  Brian Vinter,et al.  Comparing the Performance of the PastSet Distributed Shared Memory System using TCP / IP and M-VIA , 2000 .

[7]  Assaf Schuster,et al.  Harnessing The Power of Fast, Low Latency, Networks for Software DSMs , 1999 .

[8]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[9]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[10]  Michael L. Scott,et al.  The effect of network total order, broadcast, and remote-write capability on network-based shared memory computing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[11]  Dhabaleswar K. Panda,et al.  Implementing TreadMarks over GM on Myrinet: challenges, design experience, and performance evaluation , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[12]  Olav Lysne,et al.  Deadlock avoidance for switches based on wormhole networks , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[13]  Kai Li,et al.  Two virtual memory mapped network interface designs , 1994, Symposium Record Hot Interconnects II.

[14]  Caliper Corp Virtual interface architecture specification , 1997 .

[15]  Ricardo Bianchini,et al.  Efficiently adapting to sharing patterns in software DSMs , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[16]  Mithuna Thottethodi,et al.  BLAM: a high-performance routing algorithm for virtual cut-through networks , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[17]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[18]  Evan Speight Providing Hardware Dsm Performance at Software Dsm Cost Providing Hardware Dsm Performance at Software Dsm Cost , 2000 .

[19]  Liviu Iftode,et al.  Software Distributed Shared Memory over Virtual Interface Architecture: Implemenation and Performance , 2000, Annual Linux Showcase & Conference.

[20]  Howard Frazier,et al.  Gigabit Ethernet: From 100 to 1000 Mbps , 1999, IEEE Internet Comput..

[21]  Kai Li,et al.  IVY: A Shared Virtual Memory System for Parallel Computing , 1988, ICPP.