Data streaming: very low overhead communication for fine-grained multicomputing

Recent developments have greatly reduced network latencies in multiprocessor networks. Thus, software overhead is becoming the primary cost of multiprocessor communication. This paper proposes data streaming-a technique which places explicit send and receive instructions in the user code-as a means to cut software overhead to a minimum. Data streaming has the added benefit that it can tighten the coupling between processors by reducing the message size to that of a single data item. This paper presents experimental results that indicate data streaming can cut software overhead to less than one instruction per byte of data transmitted.

[1]  William J. Dally,et al.  The message-driven processor: a multicomputer processing node with efficient mechanisms , 1992, IEEE Micro.

[2]  Sudhakar Yalamanchili,et al.  Adaptive routing protocols for hypercube interconnection networks , 1993, Computer.

[3]  William J. Dally,et al.  The message-driven processor , 1992 .

[4]  William J. Dally,et al.  The J-machine network , 1992, Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computers & Processors.

[5]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[6]  James R. Larus,et al.  Where is time spent in message-passing and shared-memory programs? , 1994, ASPLOS VI.

[7]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[8]  José Duato,et al.  A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks , 1993, IEEE Trans. Parallel Distributed Syst..

[9]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[10]  Arvind,et al.  T: a multithreaded massively parallel architecture , 1992, ISCA '92.

[11]  Anoop Gupta,et al.  The Stanford FLASH Multiprocessor , 1994, ISCA.