PDIO: High-Performance Remote File I/O for Portals-Enabled Compute Nodes

Portals Direct I/O ("PDIO") is a specialpurpose middleware infrastructure for writing data from compute processor memory on Portals-enabled compute nodes to remote agents anywhere on the WAN in realtime. The prototype implementation provided a means for aggregation of outgoing data through multiple loadbalanced routing daemons, end-to-end parallel data streams through externally connected “I/O nodes”, and a bandwidth feedback mechanism for stability and robustness. It was used by one research group, demonstrated live at several conferences, and shown to deliver bandwidths of up to 800 Mbit/sec. Although the prototype met the initial design requirements for the target application, it had some limitations due to the special-purpose nature of that design. Based on experiences with that implementation, the beta version now under development has a number of interface, functionality and performance enhancements. We present the motivations for this infrastructure and the revisions that should make it a general purpose solution for users on PSC’s Cray XT3 and other compute platforms.

[1]  Paul R. Woodward,et al.  Initial experiences with grid-based volume visualization of fluid flow simulations on PC clusters , 2005, IS&T/SPIE Electronic Imaging.

[2]  J. Ray Scott,et al.  Custom Features of a Large Cluster Batch Scheduler , 2005, PDPTA.

[3]  Nathan Stone A Checkpoint and Recovery System for the Pittsburgh Supercomputing Center Terascale Computing System , 2001 .

[4]  Rolf Riesen,et al.  Portals 3.0: protocol building blocks for low overhead communication , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[5]  Derek Simmel,et al.  Terascale I/O Solutions , 2003, International Conference on Computational Science.