Exploiting Efficient and Scalable Shuffle Transfers in Future Data Center Networks

Distributed computing systems like MapReduce in data centers transfer massive amount of data across successive processing stages. Such shuffle transfers contribute most of the network traffic and make the network bandwidth become a bottleneck. In many commonly used workloads, data flows in such a transfer are highly correlated and aggregated at the receiver side. To lower down the network traffic and efficiently use the available network bandwidth, we propose to push the aggregation computation into the network and parallelize the shuffle and reduce phases. In this paper, we first examine the gain and feasibility of the in-network aggregation with BCube, a novel server-centric networking structure for future data centers. To exploit such a gain, we model the in-network aggregation problem that is NP-hard in BCube. We propose two approximate methods for building the efficient IRS-based incast aggregation tree and SRS-based shuffle aggregation subgraph, solely based on the labels of their members and the data center topology. We further design scalable forwarding schemes based on Bloom filters to implement in-network aggregation over massive concurrent shuffle transfers. Based on a prototype and large-scale simulations, we demonstrate that our approaches can significantly decrease the amount of network traffic and save the data center resources. Our approaches for BCube can be adapted to other servercentric network structures for future data centers after minimal modifications.

[1]  Haitao Wu,et al.  BCube: a high performance, server-centric network architecture for modular data centers , 2009, SIGCOMM '09.

[2]  Haitao Wu,et al.  ServerSwitch: A Programmable and High Performance Platform for Data Center Networks , 2011, NSDI.

[3]  Dharma P. Agrawal,et al.  Generalized Hypercube and Hyperbus Structures for a Computer Network , 1984, IEEE Transactions on Computers.

[4]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[5]  Jianzhong Li,et al.  Grouping-Based Resilient Statistical En-Route Filtering for Sensor Networks , 2009, IEEE INFOCOM 2009.

[6]  László Gyarmati,et al.  Scafida: a scale-free network inspired data center architecture , 2010, CCRV.

[7]  Lei Shi,et al.  Dcell: a scalable and fault-tolerant network structure for data centers , 2008, SIGCOMM '08.

[8]  Azzedine Boukerche,et al.  DRINA: A Lightweight and Reliable Routing Approach for In-Network Aggregation in Wireless Sensor Networks , 2013, IEEE Transactions on Computers.

[9]  Ding-Zhu Du,et al.  The k-Steiner Ratio in Graphs , 1997, SIAM J. Comput..

[10]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[11]  Alex Zelikovsky,et al.  Tighter Bounds for Graph Steiner Tree Approximation , 2005, SIAM J. Discret. Math..

[12]  Michael I. Jordan,et al.  Managing data transfers in computer clusters with orchestra , 2011, SIGCOMM.

[13]  A. Zelikovsky Better approximation bounds for the network and Euclidean Steiner tree problems , 1996 .

[14]  Chen Chen,et al.  Datacast: A Scalable and Efficient Reliable Group Data Delivery Service for Data Centers , 2012, IEEE Journal on Selected Areas in Communications.

[15]  Marshall W. Bern,et al.  The Steiner Problem with Edge Lengths 1 and 2 , 1989, Inf. Process. Lett..

[16]  Michael Isard,et al.  DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.

[17]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[18]  Zevi Miller,et al.  Near Optimal Bounds for Steiner Trees in the Hypercube , 2011, SIAM J. Comput..

[19]  Steven Hand,et al.  CIEL: A Universal Execution Engine for Distributed Data-Flow Computing , 2011, NSDI.

[20]  Emin Gün Sirer,et al.  Small-world datacenters , 2011, SoCC.

[21]  Yunhao Liu,et al.  Expandable and Cost-Effective Network Structures for Data Centers Using Dual-Port Servers , 2013, IEEE Transactions on Computers.

[22]  Piotr Berman,et al.  Improved approximations for the Steiner tree problem , 1992, SODA '92.

[23]  Emin Gün Sirer,et al.  SideCar: building programmable datacenter networks without programmable switches , 2010, Hotnets-IX.

[24]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[25]  Hong Liu,et al.  Energy proportional datacenter networks , 2010, ISCA.

[26]  George Markowsky,et al.  A fast algorithm for Steiner trees , 1981, Acta Informatica.

[27]  Alex Zelikovsky,et al.  An 11/6-approximation algorithm for the network steiner problem , 1993, Algorithmica.

[28]  Paolo Costa,et al.  Bridging the gap between applications and networks in data centers , 2013, OPSR.

[29]  Yunhao Liu,et al.  BCN: Expansible network structures for data centers using hierarchical compound graphs , 2011, 2011 Proceedings IEEE INFOCOM.

[30]  Antony I. T. Rowstron,et al.  Camdoop: Exploiting In-network Aggregation for Big Data Applications , 2012, NSDI.

[31]  Antony I. T. Rowstron,et al.  Symbiotic routing in future data centers , 2010, SIGCOMM '10.

[32]  Jung Ho Ahn,et al.  HyperX: topology, routing, and packaging of efficient large-scale networks , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[33]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing - "ABSTRACT" , 2009, PODC '09.

[34]  Jie Wu,et al.  The Dynamic Bloom Filters , 2010, IEEE Transactions on Knowledge and Data Engineering.

[35]  Yunhao Liu,et al.  False Negative Problem of Counting Bloom Filter , 2010, IEEE Transactions on Knowledge and Data Engineering.

[36]  Archana Ganapathi,et al.  The Case for Evaluating MapReduce Performance Using Workload Suites , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[37]  Joseph M. Hellerstein,et al.  MapReduce Online , 2010, NSDI.

[38]  Amin Vahdat,et al.  PortLand: a scalable fault-tolerant layer 2 data center network fabric , 2009, SIGCOMM '09.