Exploiting Latent I/O Asynchrony in Petascale Science Applications

We present a collection of techniques for exploiting latent I/O asynchrony which can substantially improve performance in data-intensive parallel applications. Latent asynchrony refers to an application's tolerance for decoupling ancillary operations from its core computation, and is a property of HPC codes not fully explored by current HPC I/O systems. Decoupling operations such as buffering and staging, reorganization, and format conversion in space and in time from core codes can shorten I/O phases, preserving valuable MPP compute cycles. We describe in this paper DataTaps, IOgraphs, and Metabots, three tools which allow HPC developers to implement decoupled I/O operations. Using these tools, asynchrony can be exploited by data generators which overlap computation with communication, and by data consumers that perform data conversion and reorganization out-of-band and on-demand. In the context of a data-intensive fusion simulation, we show that exploiting latent asynchrony through decoupling of operations can provide significant performance benefits.

[1]  Garth A. Gibson,et al.  A Case for Network-Attached Secure Disks, , 1996 .

[2]  R. Aymar,et al.  The ITER project , 1997 .

[3]  Ron A. Oldfield,et al.  Efficient Parallel I/o in sEismic Imaging , 1998, Int. J. High Perform. Comput. Appl..

[4]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[5]  Karsten Schwan,et al.  Efficient Wire Formats for High Performance Computing , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[6]  Joel H. Saltz,et al.  DataCutter: Middleware for Filtering Very Large Scientific Datasets on Archival Storage Systems , 2000, IEEE Symposium on Mass Storage Systems.

[7]  Karsten Schwan,et al.  Event services for high performance computing , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[8]  Karsten Schwan,et al.  Native Data Representation: an Efficient Wire Format for High Performance Computing , 2001 .

[9]  Calton Pu,et al.  Infosphere project: system support for information flow applications , 2001, SGMD.

[10]  Christos Faloutsos,et al.  Active Disks for Large-Scale Data Processing , 2001, Computer.

[11]  Karsten Schwan,et al.  SmartPointers: Personalized Scientific Data Portals In Your Hand , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[12]  David L. McDowell,et al.  A multiscale multiplicative decomposition for elastoplasticity of polycrystals , 2003 .

[13]  B.P. Miller,et al.  MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[14]  Scott Klasky,et al.  Visualizing gyrokinetic simulations , 2004, IEEE Visualization 2004.

[15]  Adam Arbree,et al.  Mapping Abstract Complex Workflows onto Grid Environments , 2003, Journal of Grid Computing.

[16]  Karsten Schwan,et al.  XChange: coupling parallel applications in a dynamic environment , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[17]  Michael T. Heath,et al.  Common‐refinement‐based data transfer between non‐matching meshes in multiphysics simulations , 2004 .

[18]  Robert Latham,et al.  A next-generation parallel file system for Linux cluster. , 2004 .

[19]  Joel H. Saltz,et al.  An approach for automatic data virtualization , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[20]  Arun Jagatheesan,et al.  Gridflow description, query, and execution at SCEC using the SDSC matrix , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[21]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[22]  Karsten Schwan,et al.  Service Augmentation for High End Interactive Data Services , 2005, 2005 IEEE International Conference on Cluster Computing.

[23]  Leonid Oliker,et al.  Leading Computational Methods on Scalar and Vector HEC Platforms , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[24]  Rajkumar Buyya,et al.  A taxonomy of scientific workflow systems for grid computing , 2005, SGMD.

[25]  Rolf Riesen,et al.  Lightweight I/O for Scientific Applications , 2006, 2006 IEEE International Conference on Cluster Computing.

[26]  Scott Klasky,et al.  Experiments with Wide Area Data Coupling Using the Seine Coupling Framework , 2006, HiPC.

[27]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[28]  Karsten Schwan,et al.  Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) , 2008, CLADE '08.

[29]  Radu Calinescu,et al.  WSRF-Based Modeling of Clinical Trial Information for Collaborative Cancer Research , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[30]  Karsten Schwan,et al.  DataStager: scalable data staging services for petascale applications , 2009, HPDC.