Harnessing The Power of Fast, Low Latency, Networks for Software DSMs

Asynchronous communication using fast, low latency networks present the challenge of delivering the performance such networks exhibit, from the network layer up to user level applications. Such applications which need to retrieve messages from the network, using polling based communication packages, often face the problems of responsiveness, cpu utilization and redundant polling. Because of the inherent asynchronous communication nature of software dsms, these overheads become apparent when the dsm is implemented on top of such communication packages. This paper presents two critical issues in the design and implementation of software dsm in the presence of fast, low latency networks. First, it describes the implementation of a novel technique, calledMultiView, which provides smallsize pages. MultiView is used for tailoring the basic sharing units that are used by the dsm to the native application data. This results in false sharing avoidance, smaller message sizes and helps to prevent excessive bu er copying. Second, it proposes a solution for customizing network drivers to e ciently support asynchronous communication, which is best experienced in dsms. Performance evaluation shows that our methods improve the responsiveness, cpu utilization and the overall dsm performance.

[1]  Partha Dasgupta,et al.  Parallel processing with windows NT networks , 1997 .

[2]  James R. Larus,et al.  Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.

[3]  Assaf Schuster,et al.  Dynamic adaptation of sharing granularity in DSM systems , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[4]  Assaf Schuster,et al.  MultiView and Millipage — fine-grain sharing in page-based DSMs , 1999, OSDI '99.

[5]  Andrew A. Chien,et al.  Coordinated thread scheduling for workstation clusters under windows NT , 1997 .

[6]  Greg J. Regnier,et al.  The Virtual Interface Architecture , 2002, IEEE Micro.

[7]  John K. Bennett,et al.  Brazos: a third generation DSM system , 1997 .

[8]  Liviu Iftode,et al.  Relaxed consistency and coherence granularity in DSM systems: a performance evaluation , 1997, PPOPP '97.

[9]  Eric Jul,et al.  A scheduling scheme for network saturated NT multiprocessors , 1997 .

[10]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[11]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[12]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[13]  Peter J. Keleher,et al.  Multi-threading and remote latency in software DSMs , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[14]  Thorsten von Eicken,et al.  Evolution of the Virtual Interface Architecture , 1998, Computer.

[15]  Kai Li,et al.  Design and implementation of virtual memory-mapped communication on Myrinet , 1997, Proceedings 11th International Parallel Processing Symposium.