Implementation and Performance of Portals 3.3 on the Cray XT3

The Portals data movement interface was developed at Sandia National Laboratories in collaboration with the University of New Mexico over the last ten years. Portals is intended to provide the functionality necessary to scale a distributed memory parallel computing system to thousands of nodes. Previous versions of Portals ran on several large-scale machines, including a 1024-node nCUBE-2, a 1800-node Intel Paragon, and the 4500-node Intel ASCI Red machine. The latest version of Portals was initially developed for an 1800-node Linux/Myrinet cluster and has since been adopted by Cray as the lowest-level network programming interface for their XT3 platform. In this paper, we describe the implementation of Portals 3.3 on the Cray XT3 and present some initial performance results from several micro-benchmark tests. Despite some limitations, the implementation of Portals is able to achieve a zero-length one-way latency of under six microseconds and a uni-directional bandwidth of more than 1.1 GB/s