Why should we integrate services, servers, and networking in a data center?

Since the early days of networks, a basic principle has been that endpoints treat the network as a black box. An endpoint injects a packet with a destination address and the network delivers the packet. This principle has served us well, and allowed us to scale the Internet to billions of devices using networks owned by competing companies and devices owned by billions of individuals. However, this approach might not be optimal for large-scale Internet data centers (DCs), such as those run by Amazon, Google, Microsoft and Yahoo, that employ custom software and customized hardware to increase efficiency and to lower costs. In DCs, all the components are controlled by a single entity, and creating services for the DC that treat the network as a black box will lead to inefficiencies. In DCs, there is the opportunity to rethink the relationship between servers, services and the network. We believe that, in order to enable more efficient intra-DC services, we should close the gap between the network, services and the servers. To this end, we have been building a direct server-to-server network topology, and have been looking at whether this makes common services quicker to implement and more efficient to operate.

[1]  John Moy,et al.  OSPF Version 2 , 1998, RFC.

[2]  Albert G. Greenberg,et al.  The cost of a cloud: research problems in data center networks , 2008, CCRV.

[3]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[4]  Ben Y. Zhao,et al.  Towards a Common API for Structured Peer-to-Peer Overlays , 2003, IPTPS.

[5]  David G. Kirkpatrick,et al.  On routing with guaranteed delivery in three-dimensional ad hoc wireless networks , 2008, Wirel. Networks.

[6]  Deborah Estrin,et al.  GHT: a geographic hash table for data-centric storage , 2002, WSNA '02.

[7]  Albert G. Greenberg,et al.  Towards a next generation data center architecture: scalability and commoditization , 2008, PRESTO '08.

[8]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[9]  Lei Shi,et al.  Dcell: a scalable and fault-tolerant network structure for data centers , 2008, SIGCOMM '08.

[10]  Brad Karp,et al.  GPSR: greedy perimeter stateless routing for wireless networks , 2000, MobiCom '00.

[11]  Michael B. Jones,et al.  FUSE: Lightweight Guaranteed Distributed Failure Notification , 2004, OSDI.

[12]  Robbert van Renesse,et al.  Navigating in the Storm: Using Astrolabe to Adaptively Configure Web Services and Their Clients , 2006, Cluster Computing.

[13]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[14]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[15]  Miguel Castro,et al.  Scribe: a large-scale and decentralized application-level multicast infrastructure , 2002, IEEE J. Sel. Areas Commun..

[16]  GhemawatSanjay,et al.  The Google file system , 2003 .

[17]  Sergiu Nedevschi,et al.  Reducing Network Energy Consumption via Sleeping and Rate-Adaptation , 2008, NSDI.

[18]  Jennifer Rexford,et al.  Floodless in seattle: a scalable ethernet architecture for large enterprises , 2008, SIGCOMM '08.

[19]  Glen Gibb,et al.  NetFPGA: reusable router architecture for experimental research , 2008, PRESTO '08.

[20]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[21]  Brad Karp,et al.  Greedy Perimeter Stateless Routing for Wireless Networks , 2000 .

[22]  Michael Isard,et al.  Autopilot: automatic data center management , 2007, OPSR.

[23]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[24]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.