Totoro: A Scalable and Fault-Tolerant Data Center Network by Using Backup Port

Scalability and fault tolerance become a fundamental challenge of data center network structure due to the explosive growth of data. Both structures proposed in the area of parallel computing and structures based on tree hierarchy are not able to satisfy these two demands. In this paper, we propose Totoro, a scalable and fault-tolerant network to handle the challenges by using backup built-in Ethernet ports. We connect a bunch of servers to an intra-switch to form a basic partition. Then we utilize half of backup ports to connect those basic partitions with inter-switches to build a larger partition. Totoro is hierarchically and recursively defined and the high-level Totoro is constructed by many low-level Totoros. Totoro can scale to millions of nodes. We also design a fault-tolerant routing protocol. Its capability is very close to the performance bound. Our experiments show that Totoro is a viable interconnection structure for data centers.

[1]  Haitao Wu,et al.  FiConn: Using Backup Port for Server Interconnection in Data Centers , 2009, IEEE INFOCOM 2009.

[2]  Behrooz Parhami,et al.  Introduction to Parallel Processing: Algorithms and Architectures , 1999 .

[3]  Austin Donnelly,et al.  CamCube: A key-based data center , 2010 .

[4]  Amin Vahdat,et al.  PortLand: a scalable fault-tolerant layer 2 data center network fabric , 2009, SIGCOMM '09.

[5]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[6]  Dmitri Loguinov,et al.  Graph-theoretic analysis of structured peer-to-peer systems: routing distances and fault resilience , 2003, IEEE/ACM Transactions on Networking.

[7]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[8]  Christopher Chute,et al.  The Diverse and Exploding Digital Universe , 2011 .

[9]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[10]  Kim Faber A typical day , 2008 .

[11]  Yuhui Deng RISC: A resilient interconnection network for scalable cluster storage systems , 2008, J. Syst. Archit..

[12]  Josef Stoer,et al.  Numerische Mathematik 1 , 1989 .

[13]  Jeffrey Dean,et al.  Keynote talk: Experiences with MapReduce, an abstraction for large-scale computation , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[14]  Lei Shi,et al.  Dcell: a scalable and fault-tolerant network structure for data centers , 2008, SIGCOMM '08.