Multi Protocol Active Messages on a Cluster of SMP

Clusters of multiprocessors, or Clumps, promise to be the supercomputers of the future, but obtaining high performance on these architectures requires an understanding of interactions between the multiple levels of interconnection. In this paper, we present the first multi-protocol implementation of a lightweight message layer---a version of Active Messages-II running on a cluster of Sun Enterprise 5000 servers connected with Myrinet. This research brings together several pieces of high-performance interconnection technology: bus backplanes for symmetric multiprocessors, low-latency networks for connections between machines, and simple, user-level primitives for communication. The paper describes the shared memory message-passing protocol and analyzes the multi-protocol implementation with both microbenchmarks and Split-C applications. Three aspects of the communication layer are critical to performance: the overhead of cache-coherence mechanisms, the method of managing concurrent access, and the cost of accessing state with the slower protocol. Through the use of an adaptive polling strategy, the multi-protocol implementation limits performance interactions between the protocols, delivering up to 160 MB/s of bandwidth with 3.6 microsecond end-to-end latency. Applications within an SMP benefit from this fast communication, running up to 75% faster than on a network of uniprocessor workstations. Applications running on the entire Clump are limited by the balance of NIC's to processors in our system, and are typically slower than on the NOW. These results illustrate several potential pitfalls for the Clumps architecture.

[1]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[2]  D. Culler,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[3]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[4]  M. J. Carlton,et al.  Micro benchmark analysis of the KSR1 , 1993, Supercomputing '93.

[5]  T. von Eicken,et al.  Parallel programming in Split-C , 1993, Supercomputing '93.

[6]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[7]  Ewing L. Lusk,et al.  Monitors, Messages, and Clusters: The p4 Parallel Programming System , 1994, Parallel Comput..

[8]  Matthew Haines,et al.  On the design of Chant: a talking threads package , 1994, Proceedings of Supercomputing '94.

[9]  Thorsten von Eicken,et al.  Low-latency communication over ATM networks using active messages , 1994, Symposium Record Hot Interconnects II.

[10]  Eric A. Brewer,et al.  How to get good performance from the CM-5 data network , 1994, Proceedings of 8th International Parallel Processing Symposium.

[11]  Richard P. Martin,et al.  HPAM: an active message layer for a network of hp workstations , 1994, Symposium Record Hot Interconnects II.

[12]  Lewis W. Tucker,et al.  CMMD: Active Messages on the CM-5 , 1994, Parallel Comput..

[13]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[14]  Chris J. Scheiman,et al.  Experience with active messages on the Meiko CS-2 , 1995, Proceedings of 9th International Parallel Processing Symposium.

[15]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[16]  E. L. Lusk,et al.  A taxonomy of programming models for symmetric multiprocessors and SMP clusters , 1995, Programming Models for Massively Parallel Computers.

[17]  David E. Culler,et al.  Active message applications programming interface and communication subsystem organization , 1995 .

[18]  David E. Culler,et al.  Active Message Applications Programming Interface , 1996 .

[19]  Ian T. Foster,et al.  The Nexus Approach to Integrating Multithreading and Communication , 1996, J. Parallel Distributed Comput..

[20]  Erik Hagersten,et al.  Gigaplane: A High Performance Bus for Large SMPs , 2003 .

[21]  A. Agarwal,et al.  MGS: A Multigrain Shared Memory System , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[22]  Paul R. Woodward Perspectives on Supercomputing: Three Decades of Change , 1996, Computer.

[23]  Richard P. Martin,et al.  Assessing Fast Network Interfaces , 1996, IEEE Micro.

[24]  Philip Heidelberger,et al.  Message proxies for efficient, protected communication on SMP clusters , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[25]  Jaswinder Pal Singh,et al.  Application restructuring and performance portability on shared virtual memory and hardware-coherent multiprocessors , 1997, PPOPP '97.

[26]  Ian T. Foster,et al.  Managing Multiple Communication Methods in High-Performance Networked Computing Systems , 1997, J. Parallel Distributed Comput..

[27]  Mark D. Hill,et al.  A case for making network interfaces less peripheral , 1997 .

[28]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[29]  David E. Culler,et al.  Managing concurrent access for shared memory active messages , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.