Moving big data to the cloud

Cloud computing, rapidly emerging as a new computation paradigm, provides agile and scalable resource access in a utility-like fashion, especially for the processing of big data. An important open issue here is how to efficiently move the data, from different geographical locations over time, into a cloud for effective processing. The de facto approach of hard drive shipping is not flexible, nor secure. This work studies timely, cost-minimizing upload of massive, dynamically-generated, geodispersed data into the cloud, for processing using a MapReducelike framework. Targeting at a cloud encompassing disparate data centers, we model a cost-minimizing data migration problem, and propose two online algorithms, for optimizing at any given time the choice of the data center for data aggregation and processing, as well as the routes for transmitting data there. The first is an online lazy migration (OLM) algorithm achieving a competitive ratio of as low as 2.55, under typical system settings. The second is a randomized fixed horizon control (RFHC) algorithm achieving a competitive ratio of 1+ 1/l+λ κ/λ with a lookahead window of l, where κ and λ are system parameters of similar magnitude.

[1]  David R. Cox,et al.  Time Series Analysis , 2012 .

[2]  Bo Li,et al.  Scaling social media applications into geo-distributed clouds , 2012, 2012 Proceedings IEEE INFOCOM.

[3]  George E. P. Box,et al.  Time Series Analysis: Box/Time Series Analysis , 2008 .

[4]  Lachlan L. H. Andrew,et al.  Online algorithms for geographical load balancing , 2012, 2012 International Green Computing Conference (IGCC).

[5]  Lachlan L. H. Andrew,et al.  Dynamic Right-Sizing for Power-Proportional Data Centers , 2011, IEEE/ACM Transactions on Networking.

[6]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[7]  David A. Maltz,et al.  Cloudward bound: planning for beneficial migration of enterprise applications to the cloud , 2010, SIGCOMM '10.

[8]  Minghua Chen,et al.  Moving Big Data to The Cloud: An Online Cost-Minimizing Approach , 2013, IEEE Journal on Selected Areas in Communications.

[9]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[10]  Robert J. Brunner,et al.  Massive datasets in astronomy , 2001 .

[11]  M. Crawford The Human Genome Project. , 1990, Human biology.

[12]  Indranil Gupta,et al.  New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[13]  Panos M. Pardalos,et al.  Handbook of Massive Data Sets , 2002, Massive Computing.

[14]  Minghua Chen,et al.  Simple and effective dynamic provisioning for power-proportional data centers , 2011, 2012 46th Annual Conference on Information Sciences and Systems (CISS).

[15]  Christopher Olston,et al.  Stateful bulk processing for incremental analytics , 2010, SoCC '10.

[16]  Chenyu Wang,et al.  Exploring MapReduce efficiency with highly-distributed data , 2011, MapReduce '11.