The File Mover: an efficient data transfer system for Grid applications

In this paper we present the File Mover, a data transfer system designed to optimize the transfer of potentially very large files. The File Mover relies on an overlay network architecture, where a set of machines cooperate in the transfer by forwarding among them portions of the files being transferred. Data transfer times are minimized by choosing, for each transfer the set of relays that maximize the expected throughput. Preliminary experiments show that the File Mover is able to profitably exploit existing network paths not chosen by IP routing algorithms, thereby enhancing file transfer performance.

[1]  M. Frans Kaashoek,et al.  A measurement study of available bandwidth estimation tools , 2003, IMC '03.

[2]  Xiaowei Yang,et al.  A passive approach for detecting shared bottlenecks , 2001, Proceedings Tenth International Conference on Computer Communications and Networks (Cat. No.01EX495).

[3]  Mark H. Ellisman,et al.  Data-intensive e-science frontier research , 2003, CACM.

[4]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[5]  Brian Tierney,et al.  Netest: a tool to measure the maximum burst size, available bandwidth and achievable throughput , 2003, International Conference on Information Technology: Research and Education, 2003. Proceedings. ITRE2003..

[6]  Ian T. Foster,et al.  Data management and transfer in high-performance computational grid environments , 2002, Parallel Comput..

[7]  Robert L. Grossman,et al.  SABUL: A High Performance Data Transfer Protocol , 2003 .

[8]  RubensteinDan,et al.  Detecting shared congestion of flows via end-to-end measurement , 2002 .

[9]  Anees Shaikh,et al.  An empirical evaluation of wide-area internet bottlenecks , 2003, SIGMETRICS '03.

[10]  Phillip M. Dickens FOBS: A Lightweight Communication Protocol for Grid Computing , 2003, Euro-Par.

[11]  Ying Ding,et al.  Algorithms for High Performance, Wide-Area Distributed File Downloads , 2003, Parallel Process. Lett..

[12]  Brian Tierney,et al.  TCP Tuning Guide for Distributed Application on Wide Area Networks , 2001, login Usenix Mag..

[13]  Vern Paxson,et al.  End-to-end Internet packet dynamics , 1997, SIGCOMM '97.

[14]  Donald F. Towsley,et al.  Detecting shared congestion of flows via end-to-end measurement , 2000, SIGMETRICS '00.

[15]  Manish Jain,et al.  End-to-end available bandwidth: measurement methodology, dynamics, and relation with TCP throughput , 2003, TNET.

[16]  Stefan Savage,et al.  The end-to-end effects of Internet path selection , 1999, SIGCOMM '99.

[17]  V. NageswaraS.,et al.  Multiple Paths for End-To-End Delay Minimization in DistributedComputing Over Internet , 2001 .

[18]  Anees Shaikh,et al.  An empirical evaluation of wide-area internet bottlenecks , 2003, IMC '03.

[19]  V. Paxson End-to-end routing behavior in the internet , 2006, CCRV.

[20]  Michael Dahlin,et al.  End-to-end WAN service availability , 2001, TNET.

[21]  Deborah Estrin,et al.  The impact of routing policy on Internet paths , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[22]  Richard Wolski,et al.  Dynamically forecasting network performance using the Network Weather Service , 1998, Cluster Computing.

[23]  ArtemTrunov,et al.  Peer—to—Peer Computing for secure High Performance Data Copying , 2001 .

[24]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[25]  Randy H. Katz,et al.  Tomography-based overlay network monitoring , 2003, IMC '03.

[26]  Abhijit Bose,et al.  Delayed Internet routing convergence , 2000, SIGCOMM.

[27]  Mike Hibler,et al.  An integrated experimental environment for distributed systems and networks , 2002, OPSR.

[28]  Hari Balakrishnan,et al.  Resilient overlay networks , 2001, SOSP.

[29]  Kavitha Ranganathan,et al.  Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids , 2003, Journal of Grid Computing.