Budget-constrained bulk data transfer via internet and shipping networks

Cloud collaborators wish to combine large amounts of data, in the order of TBs, from multiple distributed locations to a single datacenter. Such groups are faced with the challenge of reducing the latency of the transfer, without incurring excessive dollar costs. Our Pandora system is an autonomic system that creates data transfer plans that can satisfy latency and cost needs, by considering transferring the data through both Internet and disk shipments. Solving the planning problem is a critical step towards a truly autonomic bulk data transfer service. In this paper, we develop techniques to create an optimal transfer plan that minimizes transfer latency subject to a budget constraint. To systematically explore the solution space, we develop efficient binary search methods that find the optimal shipment transfer plan. Our experimental evaluation, driven by Internet bandwidth traces and actual shipment costs queried from FedEx web services, shows that these techniques work well on diverse, realistic networks.

[1]  A. Brameller,et al.  Solution of fixed cost trans-shipment problems by a branch and bound method , 1978 .

[2]  Hamdy A. Taha,et al.  Operations Research: An Introduction, 8/e , 2008 .

[3]  Jim Gray,et al.  A Conversation with Jim Gray , 2003, ACM Queue.

[4]  Indranil Gupta,et al.  New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[5]  Kai Li,et al.  Bridging the digital divide: storage media + postal network = generic high-bandwidth communication , 2005, TOS.

[6]  Sujata Banerjee,et al.  S3: a scalable sensing service for monitoring large networked systems , 2006, INM '06.

[7]  J. Winch,et al.  Supply Chain Management: Strategy, Planning, and Operation , 2003 .

[8]  KyoungSoo Park,et al.  Scale and Performance in the CoBlitz Large-File Distribution Service , 2006, NSDI.

[9]  David G. Andersen,et al.  An Architecture for Internet Data Transfer , 2006, NSDI.

[10]  Éva Tardos,et al.  “The quickest transshipment problem” , 1995, SODA '95.

[11]  Paul Watson e-Science in the Cloud with CARMEN , 2007, Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT 2007).

[12]  D. R. Fulkerson,et al.  Constructing Maximal Dynamic Flows from Static Flows , 1958 .

[13]  M. Frans Kaashoek,et al.  A measurement study of available bandwidth estimation tools , 2003, IMC '03.

[14]  S. Chopra,et al.  Supply Chain Management: Strategy, Planning & Operation , 2007 .

[15]  Simson L. Garfinkel,et al.  An Evaluation of Amazon's Grid Computing Services: EC2, S3, and SQS , 2007 .

[16]  John Shalf,et al.  Defining future platform requirements for e-Science clouds , 2010, SoCC '10.

[17]  Hamdy A. Taha,et al.  Operations research: an introduction / Hamdy A. Taha , 1982 .

[18]  Joel H. Saltz,et al.  Using overlays for efficient data transfer over shared wide-area networks , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[19]  R. Fisher The Advanced Theory of Statistics , 1943, Nature.

[20]  John N. Hooker,et al.  Solving Fixed-Charge Network Flow Problems with a Hybrid Optimization and Constraint Programming Approach , 2002, Ann. Oper. Res..

[21]  Martin Skutella,et al.  Quickest Flows Over Time , 2007, SIAM J. Comput..

[22]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).