Reliable communications in FTL

Local-area networks based on high-bandwidth packet-switching technology, such as ATM, show a tremendous promise in reducing communication latencies and overheads. However, the lack of flow-control and reliable delivery in ATM networks requires the higher protocol layers to deal with cell loss or corruption. While it is possible to use TCP-based communications over ATM, the protocol mismatch results in a significant loss of the bandwidth. In addition to this, there is also a significant mismatch between the stream-based communication model provided by TCP, and the message-oriented communication model required by the parallel applications.This paper describes the design and implementation of a reliable messaging layer aimed at workstation clusters using broadcast and packet-switched local area networks.