Modeling and Optimizing Large-Scale Wide-Area Data Transfers

Data generated by experimental, simulation, and observational science is growing exponentially. The resulting datasets are often transported over wide-area networks for storage, analysis, or visualization. Network bandwidth, which is not increasing at the same rate as dataset sizes, is becoming a key obstacle to data-driven sciences. In this paper, we focus on how bandwidth allocation can be controlled at the level of a protocol such as Grid FTP, in view of goals such as maintaining certain priorities or performing scheduling with specified objectives. In particular, we explore how Grid FTP transfer performance can be controlled by using parallelism and concurrency. We find that concurrency turns out to be a more powerful control knob than is parallelism. For a source where most bandwidth is consumed by transfers to as mall number of other destinations, we build a model for each destination's achieved throughput in terms of its concurrency and total concurrency (over Grid FTP transfers) to other major destinations. We then enhance this model by including an indicator of the time-varying external load, using multiple ways to measure this external load. We study the effectiveness of the proposed models in controlling the bandwidth allocation. After evaluating the numerous combinations of models and methods of measuring external load, we narrow in on the four best-performing ones, based on both their validation results and their applicability. After extensive testing of these four approaches, we find that they can obtain desired bandwidth allocations with a mean(median) error rate of19.8%(13.8%), with 38% of the errors in our benchmark tests being less than 10% and 54% of them being less than 15%.

[1]  Jon Crowcroft,et al.  Differentiated end-to-end Internet services using a weighted proportional fair sharing TCP , 1998, CCRV.

[2]  Amit Kumar,et al.  Fairness measures for resource allocation , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[3]  Peter A. Dinda,et al.  Modeling and taming parallel TCP on the wide area network , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[4]  Phil Andrews,et al.  Project Summary: XSEDE: eXtreme Science and Engineering Discovery Environment , 2010 .

[5]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[6]  Mary K. Vernon,et al.  Target bandwidth sharing using endhost measures , 2007, Perform. Evaluation.

[7]  Brian D. Noble,et al.  The end-to-end performance effects of parallel TCP sockets on a lossy wide-area network , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[8]  Ian T. Foster,et al.  Managed GridFTP , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[9]  Tevfik Kosar,et al.  Prediction of Optimal Parallelism Level in Wide Area Data Transfers , 2011, IEEE Transactions on Parallel and Distributed Systems.

[10]  Hiroyuki Ohsaki,et al.  GridFTP-APT: automatic parallelism tuning mechanism for data transfer protocol GridFTP , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[11]  Ellen W. Zegura,et al.  Utility max-min: an application-oriented bandwidth allocation scheme , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[12]  Eitan Altman,et al.  Parallel TCP Sockets: Simple Model, Throughput and Validation , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[13]  Hiroyuki Ohsaki,et al.  On modeling GridFTP using fluid-flow approximation for high speed grid networking , 2004, 2004 International Symposium on Applications and the Internet Workshops. 2004 Workshops..

[14]  Ashish Goel,et al.  Combining fairness with throughput: online routing with multiple objectives , 2000, STOC '00.

[15]  Catherine Rosenberg,et al.  A game theoretic framework for bandwidth allocation and pricing in broadband networks , 2000, TNET.

[16]  Steven Tuecke,et al.  GridFTP: Protocol Extensions to FTP for the Grid , 2001 .

[17]  Ian T. Foster,et al.  Software as a service for data scientists , 2012, Commun. ACM.

[18]  Ian T. Foster,et al.  Globus Online: Accelerating and Democratizing Science through Cloud-Based Services , 2011, IEEE Internet Computing.

[19]  Hamid Ahmadi,et al.  Equivalent Capacity and Its Application to Bandwidth Allocation in High-Speed Networks , 1991, IEEE J. Sel. Areas Commun..

[20]  J. Postel,et al.  File transfer protocol (FTP) , 1985 .

[21]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).