Transferring a petabyte in a day

Abstract Extreme-scale simulations and experiments can generate large amounts of data, whose volume can exceed the compute and/or storage capacity at the simulation or experimental facility. With the emergence of ultra-high-speed networks, researchers are considering pipelined approaches in which data are passed to a remote facility for analysis. Here we examine an extreme-scale cosmology simulation that, when run on a large fraction of a leadership computer, generates data at a rate of one petabyte per elapsed day. Writing those data to disk is inefficient and impractical, and in situ analysis poses its own difficulties. Thus we implement a pipeline in which data are generated on one supercomputer and then transferred, as they are generated, to a remote supercomputer for analysis. We use the Swift scripting language to instantiate this pipeline across Argonne National Laboratory and the National Center for Supercomputing Applications, which are connected by a 100 Gb/s network; and we demonstrate that by using the Globus transfer service we can achieve a sustained rate of 93 Gb/s over a 24-hour period, thus attaining our performance goal of one petabyte moved in 24 h. This paper describes the methods used and summarizes the lessons learned in this demonstration.

[1]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[2]  Eli Dart,et al.  The Science DMZ: A network design pattern for data-intensive science , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[3]  Andrea C. Arpaci-Dusseau,et al.  An analysis of data corruption in the storage stack , 2008, TOS.

[4]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[5]  Ian T. Foster,et al.  A Mathematical Programming- and Simulation-Based Framework to Evaluate Cyberinfrastructure Design Choices , 2017, 2017 IEEE 13th International Conference on e-Science (e-Science).

[6]  Vern Paxson,et al.  End-to-end Internet packet dynamics , 1997, SIGCOMM '97.

[7]  Prasanna Balaprakash,et al.  Explaining Wide Area Data Transfer Performance , 2017, HPDC.

[8]  Chase Qishi Wu,et al.  Measurement-based performance profiles and dynamics of UDT over dedicated connections , 2016, 2016 IEEE 24th International Conference on Network Protocols (ICNP).

[9]  Ian T. Foster,et al.  Efficient and Secure Transfer, Synchronization, and Sharing of Big Data , 2014, IEEE Cloud Computing.

[10]  Ian T. Foster,et al.  Cross-geography scientific data transferring trends and behavior , 2018, HPDC.

[11]  Donald F. Towsley,et al.  TCP Throughput Profiles Using Measurements over Dedicated Connections , 2017, HPDC.

[12]  D. Kcira,et al.  Next-generation exascale network integrated architecture for global science [Invited] , 2017, IEEE/OSA Journal of Optical Communications and Networking.

[13]  Brian Tierney,et al.  Efficient wide area data transfer protocols for 100 Gbps networks and beyond , 2013, NDM '13.

[14]  Cheng Jin,et al.  FAST TCP: Motivation, Architecture, Algorithms, Performance , 2006, IEEE/ACM Transactions on Networking.

[15]  Tevfik Kosar,et al.  Application-Level Optimization of Big Data Transfers through Pipelining, Parallelism and Concurrency , 2016, IEEE Transactions on Cloud Computing.

[16]  Patrick Fuhrmann,et al.  dCache, Storage System for the Future , 2006, Euro-Par.

[17]  Jason Lee,et al.  Lessons learned from moving earth system grid data sets over a 20 Gbps wide-area network , 2010, HPDC '10.

[18]  Craig Partridge,et al.  When the CRC and TCP checksum disagree , 2000, SIGCOMM.

[19]  Brian D. Noble,et al.  The end-to-end performance effects of parallel TCP sockets on a lossy wide-area network , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[20]  Kun-Chan Lan,et al.  A measurement study of correlations of Internet flow characteristics , 2006, Comput. Networks.

[21]  Rajkumar Kettimuthu,et al.  High-Performance Serverless Data Transfer over Wide-Area Networks , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[22]  Daniel S. Katz,et al.  Swift: A language for distributed parallel scripting , 2011, Parallel Comput..

[23]  Robert L. Grossman,et al.  UDT: UDP-based data transfer for high-speed wide area networks , 2007, Comput. Networks.

[24]  David E. Bernholdt,et al.  The earth system grid: enabling access to multimodel climate simulation data. , 2009 .

[25]  Liang Zhang,et al.  mdtmFTP and its evaluation on ESNET SDN testbed , 2018, Future Gener. Comput. Syst..

[26]  Hal Finkel,et al.  HACC: Simulating Sky Surveys on State-of-the-Art Supercomputing Architectures , 2014, 1410.2805.

[27]  Richard Hughes-Jones,et al.  Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks , 2003, Journal of Grid Computing.

[28]  John Shalf,et al.  Experiences with 100Gbps network applications , 2012, DIDC '12.

[29]  Stephen W. Poole,et al.  A technique for moving large data sets over high-performance long distance networks , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[30]  Ian T. Foster,et al.  Toward a smart data transfer node , 2018, Future Gener. Comput. Syst..