The Impact of Large-Data Transfers in Shared Wide-Area Networks: An Empirical Study

Abstract Computational science sometimes requires large data files to be transferred over high bandwidth-delay-product (BDP) wide-area networks (WANs). Experimental data (e.g., LHC, SKA), analytics logs, and filesystem backups are regularly transferred between research centres and between private-public clouds. Fortunately, a variety of tools (e.g., GridFTP, UDT, PDS) have been developed to transfer bulk data across WANs with high performance. However, using large-data transfer tools could adversely affect other network applications on shared networks. Many of the tools explicitly ignore TCP fairness to achieve high performance. Users have experienced high-latency and low-bandwidth situations when a large-data transfer is underway. But there have been few empirical studies that quantify the impact of the tools. As an extension of our previous work using synthetic background traffic, we perform an empirical analysis of how the bulk-data transfer tools perform when competing with a non-synthetic, application-based workload (e.g., Network File System). Conversely, we characterize and show that, for example, NFS performance can drop from 29 Mb/s to less than 10 Mb/s (for a single stream) when competing with bulk-data transfers on a shared network.

[1]  Jason Leigh,et al.  Reliable Blast UDP : predictable high performance bulk data transfer , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[2]  Jean C. Walrand,et al.  Fair end-to-end window-based congestion control , 2000, TNET.

[3]  Amin Vahdat,et al.  Evaluating Distributed Systems: Does Background Traffic Matter? , 2008, USENIX Annual Technical Conference.

[4]  Ming Zhang,et al.  Efficiently Delivering Online Services over Integrated Infrastructure , 2016, NSDI.

[5]  Walid Dabbous,et al.  On TCP performance in a heterogeneous network: a survey , 2000, IEEE Commun. Mag..

[6]  Brian D. Noble,et al.  The end-to-end performance effects of parallel TCP sockets on a lossy wide-area network , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[7]  Paul Lu,et al.  Large transfers for data analytics on shared wide-area networks , 2016, Conf. Computing Frontiers.

[8]  Robert L. Grossman,et al.  UDT: UDP-based data transfer for high-speed wide area networks , 2007, Comput. Networks.

[9]  Min Zhu,et al.  B4: experience with a globally-deployed software defined wan , 2013, SIGCOMM.

[10]  Luigi Rizzo,et al.  Dummynet revisited , 2010, CCRV.

[11]  Aniket Mahanti,et al.  Comparative performance analysis of high-speed transfer protocols for big data , 2013, 38th Annual IEEE Conference on Local Computer Networks.

[12]  Eitan Altman,et al.  Parallel TCP Sockets: Simple Model, Throughput and Validation , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[13]  Brian Tierney,et al.  Efficient data transfer protocols for big data , 2012, 2012 IEEE 8th International Conference on E-Science.

[14]  Jason Lee,et al.  Intra and Interdomain Circuit Provisioning Using the OSCARS Reservation System , 2006, 2006 3rd International Conference on Broadband Communications, Networks and Systems.

[15]  Alexander Afanasyev,et al.  Host-to-Host Congestion Control for TCP , 2010, IEEE Communications Surveys & Tutorials.

[16]  Raj Jain,et al.  A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems , 1998, ArXiv.

[17]  Jasleen Kaur,et al.  Can Machine Learning Benefit Bandwidth Estimation at Ultra-high Speeds? , 2016, PAM.

[18]  Srikanth Kandula,et al.  Achieving high utilization with software-driven WAN , 2013, SIGCOMM.

[19]  Hao Jiang,et al.  Why is the internet traffic bursty in short time scales? , 2005, SIGMETRICS '05.

[20]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[21]  Lavanya Ramakrishnan,et al.  On-demand Overlay Networks for Large Scientific Data Transfers , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.