An efficient communication scheme for distributed parallel processor systems

These days, the bandwidth and latency of interprocessor networks is the most limiting factor in reaching good speedup values on multiprocessor computers. There is, however, still potential for making better use of the available bandwidth. In this paper, we present a communication scheme which supports the data-parallel programming model and which can be totally implemented in hardware. This allows us to discard the usual software layer between application programs and the network interface, leading to an efficient bandwidth usage. The scheme could be roughly described as "ordered multicast". Two different bus based multiprocessor systems using this scheme have been built. In order to overcome the limitation of these single board systems, the communication scheme has been adapted to a ring network. A prototype of this network was implemented and is operational at 100 MBit/s using standard optical transmitters and receivers.