Optimizing End-to-End Big Data Transfers over Terabits Network Infrastructure

While future terabit networks hold the promise of significantly improving big-data motion among geographically distributed data centers, significant challenges must be overcome even on today's 100 gigabit networks to realize end-to-end performance. Multiple bottlenecks exist along the end-to-end path from source to sink, for instance, the data storage infrastructure at both the source and sink and its interplay with the wide-area network are increasingly the bottleneck to achieving high performance. In this paper, we identify the issues that lead to congestion on the path of an end-to-end data transfer in the terabit network environment, and we present a new bulk data movement framework for terabit networks, called LADS. LADS exploits the underlying storage layout at each endpoint to maximize throughput without negatively impacting the performance of shared storage resources for other users. LADS also uses the Common Communication Interface (CCI) in lieu of the sockets interface to benefit from hardware-level zero-copy, and operating system bypass capabilities when available. It can further improve data transfer performance under congestion on the end systems using buffering at the source using flash storage. With our evaluations, we show that LADS can avoid congested storage elements within the shared storage resource, improving input/output bandwidth, and data transfer rates across the high speed networks. We also investigate the performance degradation problems of LADS due to I/O contention on the parallel file system (PFS), when multiple LADS tools share the PFS. We design and evaluate a meta-scheduler to coordinate multiple I/O streams while sharing the PFS, to minimize the I/O contention on the PFS. With our evaluations, we observe that LADS with meta-scheduling can further improve the performance by up to 14 percent relative to LADS without meta-scheduling.

[1]  Chao Wang,et al.  NVMalloc: Exposing an Aggregate SSD Store as a Memory Partition in Extreme-Scale Machines , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[2]  Josep Torrellas Architectures for Extreme-Scale Computing , 2009, Computer.

[3]  Chuck Lever,et al.  The Linux Scalability Project , 1999 .

[4]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[5]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[6]  Stephen W. Poole,et al.  A technique for moving large data sets over high-performance long distance networks , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[7]  Chita R. Das,et al.  D-factor: a quantitative model of application slow-down in multi-resource shared systems , 2012, SIGMETRICS '12.

[8]  Raghul Gunasekaran,et al.  Feedback Computing in Leadership Compute Systems , 2014, Feedback Computing.

[9]  Vivek S. Pai,et al.  SSDAlloc: Hybrid SSD/RAM Memory Management Made Easy , 2011, NSDI.

[10]  Karsten Schwan,et al.  Managing Variability in the IO Performance of Petascale Storage Systems , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Scott Klasky,et al.  Characterizing output bottlenecks in a supercomputer , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Dhabaleswar K. Panda,et al.  High Performance Data Transfer in Grid Environment Using GridFTP over InfiniBand , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[13]  Galen M. Shipman,et al.  End-to-end data movement using MPI-IO over routed terabits infrastructures , 2013, NDM '13.

[14]  Maya Gokhale,et al.  DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[15]  Brian Tierney,et al.  Protocols for wide-area data-intensive applications: Design and performance issues , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[16]  George Bosilca,et al.  The Common Communication Interface (CCI) , 2011, 2011 IEEE 19th Annual Symposium on High Performance Interconnects.

[17]  Galen M. Shipman,et al.  LADS: Optimizing Data Transfers Using Layout-Aware Data Scheduling , 2015, FAST.

[18]  Peter J. Varman,et al.  pClock: an arrival curve based approach for QoS guarantees in shared storage systems , 2007, SIGMETRICS '07.

[19]  Galen M. Shipman,et al.  Workload characterization of a leadership class storage cluster , 2010, 2010 5th Petascale Data Storage Workshop (PDSW '10).

[20]  Shudong Jin,et al.  Design and performance evaluation of NUMA-aware RDMA-based end-to-end data transfer systems , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).