LADS: Optimizing Data Transfers Using Layout-Aware Data Scheduling

While future terabit networks hold the promise of significantly improving big-data motion among geographically distributed data centers, significant challenges must be overcome even on today's 100 gigabit networks to realize end-to-end performance. Multiple bottlenecks exist along the end-to-end path from source to sink. Data storage infrastructure at both the source and sink and its interplay with the wide-area network are increasingly the bottleneck to achieving high performance. In this paper, we identify the issues that lead to congestion on the path of an end-to-end data transfer in the terabit network environment, and we present a new bulk data movement framework called LADS for terabit networks. LADS exploits the underlying storage layout at each endpoint to maximize throughput without negatively impacting the performance of shared storage resources for other users. LADS also uses the Common Communication Interface (CCI) in lieu of the sockets interface to use zero-copy, OS-bypass hardware when available. It can further improve data transfer performance under congestion on the end systems using buffering at the source using flash storage. With our evaluations, we show that LADS can avoid congested storage elements within the shared storage resource, improving I/O bandwidth, and data transfer rates across the high speed networks.

[1]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[2]  Shudong Jin,et al.  Design and performance evaluation of NUMA-aware RDMA-based end-to-end data transfer systems , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[3]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[4]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[5]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[6]  Maya Gokhale,et al.  DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[7]  Josep Torrellas Architectures for Extreme-Scale Computing , 2009, Computer.

[8]  Galen M. Shipman,et al.  End-to-end data movement using MPI-IO over routed terabits infrastructures , 2013, NDM '13.

[9]  Yang Liu,et al.  Automatic identification of application I/O signatures from noisy server-side traces , 2014, FAST.

[10]  Youngjae Kim,et al.  D-factor: a quantitative model of application slow-down in multi-resource shared systems , 2012, SIGMETRICS '12.

[11]  George Bosilca,et al.  The Common Communication Interface (CCI) , 2011, 2011 IEEE 19th Annual Symposium on High Performance Interconnects.

[12]  Karsten Schwan,et al.  Managing Variability in the IO Performance of Petascale Storage Systems , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Peter J. Varman,et al.  pClock: an arrival curve based approach for QoS guarantees in shared storage systems , 2007, SIGMETRICS '07.

[14]  Vivek S. Pai,et al.  SSDAlloc: Hybrid SSD/RAM Memory Management Made Easy , 2011, NSDI.

[15]  Scott Klasky,et al.  Runtime I/O Re-Routing + Throttling on HPC Storage , 2013, HotStorage.

[16]  Saurabh Gupta,et al.  Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[17]  Chao Wang,et al.  NVMalloc: Exposing an Aggregate SSD Store as a Memory Partition in Extreme-Scale Machines , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[18]  Ian T. Foster,et al.  Predicting the performance of wide area data transfers , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[19]  Galen M. Shipman,et al.  Layout-aware I/O Scheduling for terabits data movement , 2013, 2013 IEEE International Conference on Big Data.

[20]  Scott Klasky,et al.  Characterizing output bottlenecks in a supercomputer , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[21]  Dhabaleswar K. Panda,et al.  High Performance Data Transfer in Grid Environment Using GridFTP over InfiniBand , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[22]  Chita R. Das,et al.  D-factor: a quantitative model of application slow-down in multi-resource shared systems , 2012, SIGMETRICS '12.

[23]  Stephen W. Poole,et al.  A technique for moving large data sets over high-performance long distance networks , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[24]  Garth A. Gibson,et al.  Parity declustering for continuous operation in redundant disk arrays , 1992, ASPLOS V.

[25]  Chuck Lever,et al.  The Linux Scalability Project , 1999 .

[26]  Dean Hildebrand,et al.  Panache: A Parallel File System Cache for Global File Access , 2010, FAST.

[27]  Brian Tierney,et al.  Protocols for wide-area data-intensive applications: Design and performance issues , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.