Distributing Storage in Cloud Environments

Cloud computing has a major impact on today's IT strategies. Outsourcing applications from IT departments to the cloud relieves users from building big infrastructures as well as from building the corresponding expertise, and allows them to focus on their main competences and businesses. One of the main hurdles of cloud computing is that not only the application, but also the data has to be moved to the cloud. Networking speed severely limits the amount of data that can travel between the cloud and the user, between different sites of the same cloud provider, or indeed between different cloud providers. It is therefore important to keep applications near the data itself. This paper investigates in which way load balancing of the computational resources as well as the data locality can be maintained at the same time. We apply recent results from balls-into-bins theory to test their applicability to cloud storage environments. We show that it is possible to both balance the load nearly perfectly and to keep the data close to its origin. The results are based on theoretical analyses and simulation of the underlying physical infrastructure of the Internet.

[1]  Nicholas C. Wormald,et al.  Generating Random Regular Graphs Quickly , 1999, Combinatorics, Probability and Computing.

[2]  Virgílio A. F. Almeida,et al.  Characterizing reference locality in the WWW , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[3]  Jeffrey Considine,et al.  Simple Load Balancing for Distributed Hash Tables , 2003, IPTPS.

[4]  Pavlin Radoslavov,et al.  An Analysis of The Internal Structure of Large Autonomous Systems , 2002 .

[5]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[6]  GhemawatSanjay,et al.  The Google file system , 2003 .

[7]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[8]  André Brinkmann,et al.  Balls into bins with related random choices , 2010, SPAA '10.

[9]  Brighten Godfrey,et al.  Balls and bins with structure: balanced allocations on hypergraphs , 2008, SODA '08.

[10]  Azer Bestavros,et al.  Small-world characteristics of Internet topologies and implications on multicast scaling , 2006, Comput. Networks.

[11]  Austin Donnelly,et al.  Sierra: practical power-proportionality for data center storage , 2011, EuroSys '11.

[12]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[13]  Berthold Vöcking,et al.  How asymmetry helps load balancing , 1999, JACM.

[14]  Alec Wolman,et al.  Volley: Automated Data Placement for Geo-Distributed Cloud Services , 2010, NSDI.

[15]  Friedhelm Meyer auf der Heide,et al.  Dynamic Load Balancing in Distributed Hash Tables , 2005, IPTPS.

[16]  Udi Wieder,et al.  Balanced allocations with heterogenous bins , 2007, SPAA '07.

[17]  Beom Jun Kim,et al.  Growing scale-free networks with tunable clustering. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Ramesh K. Sitaraman,et al.  The power of two random choices: a survey of tech-niques and results , 2001 .

[19]  G. Zipf,et al.  Relative Frequency as a Determinant of Phonetic Change , 1930 .

[20]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[21]  Sally McClean,et al.  Network Aware Cloud Computing for Data and Virtual Machine Placement , 2011 .

[22]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[23]  R. Srikant,et al.  Stochastic models of load balancing and scheduling in cloud computing clusters , 2012, 2012 Proceedings IEEE INFOCOM.

[24]  Eli Upfal,et al.  Balanced Allocations , 1999, SIAM J. Comput..

[25]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[26]  Gade Krishna,et al.  A scalable peer-to-peer lookup protocol for Internet applications , 2012 .