Intelligent Placement of Datacenters for Internet Services

Popular Internet services are hosted by multiple geographically distributed data centers. The location of the data centers has a direct impact on the services' response times, capital and operational costs, and (indirect) carbon dioxide emissions. Selecting a location involves many important considerations, including its proximity to population centers, power plants, and network backbones, the source of the electricity in the region, the electricity, land, and water prices at the location, and the average temperatures at the location. As there can be many potential locations and many issues to consider for each of them, the selection process can be extremely involved and time-consuming. In this paper, we focus on the selection process and its automation. Specifically, we propose a framework that formalizes the process as a non-linear cost optimization problem, and approaches for solving the problem. Based on the framework, we characterize areas across the United States as potential locations for data centers, and delve deeper into seven interesting locations. Using the framework and our solution approaches, we illustrate the selection trade offs by quantifying the minimum cost of (1) achieving different response times, availability levels, and consistency times, and (2) restricting services to green energy and chiller-less data centers. Among other interesting results, we demonstrate that the intelligent placement of data centers can save millions of dollars under a variety of conditions. We also demonstrate that the selection process is most efficient and accurate when it uses a novel combination of linear programming and simulated annealing.

[1]  John H. Seader,et al.  Tier Classifications Define Site Infrastructure Performance , 2006 .

[2]  Stephen D. Oliner,et al.  Commercial and Residential Land Prices Across the United States , 2010 .

[3]  Evangelos Markakis,et al.  Greedy facility location algorithms analyzed using dual fitting with factor-revealing LP , 2002, JACM.

[4]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[5]  V. Rich Personal communication , 1989, Nature.

[6]  Albert G. Greenberg,et al.  The cost of a cloud: research problems in data center networks , 2008, CCRV.

[7]  Robert S. Swarz,et al.  Reliable Computer Systems: Design and Evaluation , 1992 .

[8]  Ricardo Bianchini,et al.  Exploiting redundancy to conserve energy in storage systems , 2006, SIGMETRICS '06/Performance '06.

[9]  Benjamin K. Sovacool,et al.  Valuing the Greenhouse Gas Emissions from Nuclear Power: A Critical Survey , 2008 .

[10]  Michal Szymaniak,et al.  Replication for web hosting systems , 2004, CSUR.

[11]  I ScottKirkpatrick Optimization by Simulated Annealing: Quantitative Studies , 1984 .

[12]  Yi Sun,et al.  A location-allocation problem for a web services provider in a competitive market , 2009, Eur. J. Oper. Res..

[13]  Adam Wierzbicki,et al.  Internet Cache Location and Design of Content Delivery Networks , 2002, NETWORKING Workshops.

[14]  Frederic T. Chong,et al.  Quantifying the environmental advantages of large-scale computing , 2010, International Conference on Green Computing.

[15]  Peter W Dulhunty 'Improving Asset Utilization of Sub-transmission Lines by Real Time Rating' , 2003 .

[16]  Scott Kirkpatrick,et al.  Optimization by simulated annealing: Quantitative studies , 1984 .

[17]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.