Improving the performance of message-passing applications by multithreading

Achieving maximum performance in message-passing programs requires that calculation and communication be overlapped. However, the program transformations required to achieve this overlap are error-prone and add significant complexity to the application program. The authors argue that calculation/communication overlap can be achieved easily and consistently by executing multiple threads of control on each processor, and that this approach is practical on message-passing architectures without any special hardware support. They present timing data for a typical message-passing application, to demonstrate the advantages of the scheme.<<ETX>>