Remote queues: exposing message queues for optimization and atomicity

We introduce Remote Queues (RQ), a communication model that integrates polling with selective interrupts to support a wide range of applications and communication paradigms. We show that polling is desirable for a range of applications for both performance and atomicity. Polling enables optimizations that are essential for fine-grain applications such as sparse-matrix solution. Polling also improves flow control for high-level communication patterns such as transpose. We use RQ to implement active messages, bulk transfers, and fine-grain applications on the MIT Alewife, Intel Paragon and Cray T3D using extremely different implementations of RQ. RQ improves performance on all of the machines, and provides atomicity guarantees that greatly simplify programming for the user. RQ also separates handler invocation from draining the network, which simplifies deadlock avoidance and multiprogramming. We also introduce efficient atomicity mechanisms on Alewife to integrate polling with interrupts, and discuss how to exploit interrupts on Alewife and the Intel Paragon without forfeiting the atomicity and optimization advantages of RQ.

[1]  Paul Pierce The NX Message Passing Interface , 1994, Parallel Comput..

[2]  Seth Copen Goldstein,et al.  Evaluation of mechanisms for fine-grained parallel programs in the J-machine and the CM-5 , 1993, ISCA '93.

[3]  Mark A. Johnson,et al.  Solving problems on concurrent processors. Vol. 1: General techniques and regular problems , 1988 .

[4]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[5]  David W. Walker,et al.  The Design of a Standard Message Passing Interface for Distributed Memory Concurrent Computers , 1994, Parallel Comput..

[6]  G. C. Fox,et al.  Solving Problems on Concurrent Processors , 1988 .

[7]  Andrew A. Chien,et al.  Compressionless routing: a framework for adaptive and fault-tolerant routing , 1994, ISCA '94.

[8]  H. T. Kung,et al.  Supporting systolic and memory communication in iWarp , 1990, ISCA '90.

[9]  Andrew A. Chien,et al.  Software overhead in messaging layers: where does the time go? , 1994, ASPLOS VI.

[10]  Wilson C. Hsieh,et al.  Optimistic active messages: a mechanism for scheduling communication with computation , 1995, PPOPP '95.

[11]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[12]  Remzi H. Arpaci-Dusseau,et al.  Empirical evaluation of the CRAY-T3D: a compiler perspective , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[13]  David E. Culler,et al.  A case for NOW (networks of workstation) , 1995, PODC '95.

[14]  Frederic T. Chong,et al.  METRO: a router architecture for high-performance, short-haul routing networks , 1994, ISCA '94.

[15]  Anoop Gupta,et al.  The Stanford FLASH Multiprocessor , 1994, ISCA.

[16]  Frank Yellin,et al.  The Java Virtual Machine Specification , 1996 .

[17]  W. Daniel Hillis,et al.  The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..

[18]  T. von Eicken,et al.  Parallel programming in Split-C , 1993, Supercomputing '93.

[19]  David E. Culler,et al.  Measurements of Active Messages Performance on the CM-5 , 1994 .

[20]  M. J. Beckerle,et al.  T: integrated building blocks for parallel computing , 1993, Supercomputing '93.

[21]  Donald Yeung,et al.  Sparcle: an evolutionary processor design for large-scale multiprocessors , 1993, IEEE Micro.

[22]  Rolf Riesen,et al.  SUNMOS for the Intel Paragon - a brief user`s guide , 1994 .

[23]  Daniel A. Reed,et al.  Communication and computation performance of the CM-5 , 1993, Supercomputing '93. Proceedings.

[24]  J DallyWilliam,et al.  Evaluation of mechanisms for fine-grained parallel programs in the J-machine and the CM-5 , 1993 .

[25]  Anant Agarwal,et al.  Anatomy of a message in the Alewife multiprocessor , 1993, ICS '93.

[26]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[27]  James R. Larus,et al.  Tempest and typhoon: user-level shared memory , 1994, ISCA '94.

[28]  Chris J. Scheiman,et al.  Experience with active messages on the Meiko CS-2 , 1995, Proceedings of 9th International Parallel Processing Symposium.

[29]  Eric A. Brewer,et al.  How to get good performance from the CM-5 data network , 1994, Proceedings of 8th International Parallel Processing Symposium.