Application-aware software-defined networking to accelerate mapreduce applications (Application-aware software-defined networking to accelerate mapreduce applications)

The rise of Internet of Things sensors, social networking and mobile devices has led to an explosion of available data. Gaining insights into this data has led to the area of Big Data analytics. The MapReduce (MR) framework, as implemented in Hadoop, has become the de facto standard for Big Data analytics. It also forms a base platform for a plurality of Big Data technologies that are used today. To handle the ever-increasing data size, Hadoop is a scalable framework that allows dedicated, seemingly unbound numbers of servers to participate in the analytics process. Response time of an analytics request is an important factor for time to value/insights. While the compute and disk I/O requirements can be scaled with the number of servers, scaling the system leads to increased network traffic. Arguably, the communication-heavy phase of MR contributes significantly to the overall response time. This problem is further aggravated, if communication patterns are heavily skewed, as is not uncommon in many MR workloads. MR applications normally run in large data centers (DCs) employing dense network topologies (e.g. multi-rooted trees) with multiple paths available between any pair of hosts. These DC network designs, combined with recent software-defined network (SDN) programmability, offer a new opportunity to dynamically and intelligently configure the network to achieve shorter application runtime. The initial intuition motivating our work is that the well-defined structure of MR and the rich traffic demand information available in Hadoop s log and meta-data files could be used to guide the network control. We therefore conjecture that an application-aware network control (i.e., one that knows the applicationlevel semantics and traffic demands) can improve MR applications performance when compared to state-of-the-art application-agnostic network control. To confirm our thesis, we first studied MR systems in detail and identified typical communication patterns and common causes of network-related performance bottlenecks in MR applications. Then, we studied the state of the art in DC networks and evaluated its ability to handle MapReduce-like communication patterns. Our results confirmed the assumption that existing techniques are not able to deal with MR communication patterns mainly because of the lack of visibility of application-level information. Based on these findings, we proposed an architecture for an application-aware network control for DCs running MR applications. We implemented a prototype within a SDN controller and used it to successfully accelerate MR applications. Depending on the network oversubscription ratio, we demonstrated a 2% to 58% reduction in the job completion time for popular MR benchmarks, when compared to ECMP (the de facto flow allocation algorithm in multipath DC networks), thus, confirming the thesis. Other contributions include a method to predict network demands in MR applications, algorithms to identify the critical communication path in MR shuffle and dynamically alocate paths to flows in a multipath network, and an emulation-based testbed for realistic MR workloads.

[1]  Antonio Pescapè,et al.  D-ITG: Distributed Internet Traffic Generator , 2013, Prax. Inf.verarb. Kommun..

[2]  César A. F. De Rose,et al.  System-level impacts of persistent main memory using a search engine , 2014, Microelectron. J..

[3]  Lei Shi,et al.  Dcell: a scalable and fault-tolerant network structure for data centers , 2008, SIGCOMM '08.

[4]  Guanying Wang,et al.  A simulation approach to evaluating design decisions in MapReduce setups , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[5]  Ming Zhang,et al.  MicroTE: fine grained traffic engineering for data centers , 2011, CoNEXT '11.

[6]  Archana Ganapathi,et al.  The Case for Evaluating MapReduce Performance Using Workload Suites , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[7]  Jan Seedorf,et al.  Application-Layer Traffic Optimization (ALTO) Problem Statement , 2009 .

[8]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[9]  Scott Shenker,et al.  Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks , 2014, SoCC.

[10]  Albert G. Greenberg,et al.  Scarlett: coping with skewed content popularity in mapreduce clusters , 2011, EuroSys '11.

[11]  Praveen Yalagandula,et al.  Mahout: Low-overhead datacenter traffic management using end-host-based elephant detection , 2011, 2011 Proceedings IEEE INFOCOM.

[12]  Alan L. Cox,et al.  The Hadoop distributed filesystem: Balancing portability and performance , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[13]  Srinivasan Seshan,et al.  A case for end system multicast , 2002, IEEE J. Sel. Areas Commun..

[14]  Ion Stoica,et al.  Coflow: a networking abstraction for cluster applications , 2012, HotNets-XI.

[15]  Donald E. Knuth The Art of Computer Programming, Volume 1, Fascicle 1: MMIX -- A RISC Computer for the New Millennium (Art of Computer Programming) , 2005 .

[16]  Ronan Collobert,et al.  Large Scale Machine Learning , 2004 .

[17]  Yanpei Chen,et al.  Understanding TCP Incast and Its Implications for Big Data Workloads , 2012, login Usenix Mag..

[18]  Raj Jain,et al.  Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks , 1989, Comput. Networks.

[19]  Bogdan Nicolae,et al.  Bursting the Cloud Data Bubble: Towards Transparent Storage Elasticity in IaaS Clouds , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[20]  Matthew Roughan,et al.  Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification , 2004, IMC '04.

[21]  Yantai Shu,et al.  Study on network traffic prediction techniques , 2005, Proceedings. 2005 International Conference on Wireless Communications, Networking and Mobile Computing, 2005..

[22]  Albert G. Greenberg,et al.  The nature of data center traffic: measurements & analysis , 2009, IMC '09.

[23]  Peter Steenkiste,et al.  Darwin: customizable resource management for value-added network services , 1998, Proceedings Sixth International Conference on Network Protocols (Cat. No.98TB100256).

[24]  Michael I. Jordan,et al.  Managing data transfers in computer clusters with orchestra , 2011, SIGCOMM.

[25]  Trevor Mudge,et al.  Efficient Data Center Architectures Using Non-Volatile Memory and Reliability Techniques , 2011 .

[26]  Zhiqiang Ma,et al.  HadoopWatch: A first step towards comprehensive traffic forecasting in cloud computing , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[27]  L. Williams The Wire , 2014, The Affective Turn.

[28]  Antonio Pescapè,et al.  A tool for the generation of realistic network workload for emerging networking scenarios , 2012, Comput. Networks.

[29]  David K. Smith Network Flows: Theory, Algorithms, and Applications , 1994 .

[30]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[31]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[32]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[33]  Anees Shaikh,et al.  Programming your network at run-time for big data applications , 2012, HotSDN '12.

[34]  Antony I. T. Rowstron,et al.  Decentralized task-aware scheduling for data center networks , 2014, SIGCOMM.

[35]  Christian E. Hopps,et al.  Analysis of an Equal-Cost Multi-Path Algorithm , 2000, RFC.

[36]  Geng Lin,et al.  High Performance Network Architectures for Data Intensive Computing , 2011 .

[37]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[38]  Kostas Katrinis,et al.  Topology Configuration in Hybrid EPS/OCS Interconnects , 2012, Euro-Par.

[39]  Donald Ervin Knuth,et al.  The Art of Computer Programming, Volume II: Seminumerical Algorithms , 1970 .

[40]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[41]  César A. F. De Rose,et al.  Performance Evaluation of Container-Based Virtualization for High Performance Computing Environments , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[42]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[43]  César A. F. De Rose,et al.  A Performance Comparison of Container-Based Virtualization Systems for MapReduce Clusters , 2014, PDP.

[44]  Paolo Costa,et al.  Bridging the gap between applications and networks in data centers , 2013, OPSR.

[45]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[46]  Anupam Das,et al.  Transparent and Flexible Network Management for Big Data Processing in the Cloud , 2013, HotCloud.

[47]  Cristina L. Abad,et al.  DARE: Adaptive Data Replication for Efficient Cluster Scheduling , 2011, 2011 IEEE International Conference on Cluster Computing.

[48]  Michael Stonebraker,et al.  The Case for Shared Nothing , 1985, HPTS.

[49]  Jerome H. Saltzer,et al.  End-to-end arguments in system design , 1984, TOCS.

[50]  Antony I. T. Rowstron,et al.  Camdoop: Exploiting In-network Aggregation for Big Data Applications , 2012, NSDI.

[51]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[52]  Wenzhi Cui,et al.  DiFS: Distributed flow scheduling for adaptive routing in hierarchical data center networks , 2014, 2014 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[53]  Colin J. Ihrig JavaScript Object Notation , 2013 .

[54]  Jim Esch,et al.  Software-Defined Networking: A Comprehensive Survey , 2015, Proc. IEEE.

[55]  Nick McKeown,et al.  Reproducible network experiments using container-based emulation , 2012, CoNEXT '12.

[56]  Sujata Banerjee,et al.  Application-driven bandwidth guarantees in datacenters , 2015, SIGCOMM.

[57]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[58]  Alon Itai,et al.  On the complexity of time table and multi-commodity flow problems , 1975, 16th Annual Symposium on Foundations of Computer Science (sfcs 1975).

[59]  Haitao Wu,et al.  BCube: a high performance, server-centric network architecture for modular data centers , 2009, SIGCOMM '09.

[60]  Weikuan Yu,et al.  Design and Evaluation of Network-Levitated Merge for Hadoop Acceleration , 2014, IEEE Transactions on Parallel and Distributed Systems.

[61]  Sujata Banerjee,et al.  DevoFlow: scaling flow management for high-performance networks , 2011, SIGCOMM 2011.

[62]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[63]  Parag Agrawal,et al.  The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.

[64]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[65]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[66]  Mark Handley,et al.  Data center networking with multipath TCP , 2010, Hotnets-IX.

[67]  Ion Stoica,et al.  Efficient coflow scheduling with Varys , 2015, SIGCOMM.

[68]  Kostas Katrinis,et al.  Pythia: Faster Big Data in Motion through Predictive Software-Defined Network Optimization at Runtime , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[69]  Mahmut T. Kandemir,et al.  MROrchestrator: A Fine-Grained Resource Orchestration Framework for MapReduce Clusters , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[70]  Kostas Katrinis,et al.  MiceTrap: Scalable traffic engineering of datacenter mice flows using OpenFlow , 2013, 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013).

[71]  Luis Ceze,et al.  Operating System Implications of Fast, Cheap, Non-Volatile Memory , 2011, HotOS.

[72]  Nick McKeown,et al.  A network in a laptop: rapid prototyping for software-defined networks , 2010, Hotnets-IX.

[73]  G.J. Minden,et al.  A survey of active network research , 1997, IEEE Communications Magazine.

[74]  Nick McKeown,et al.  OpenFlow: enabling innovation in campus networks , 2008, CCRV.

[75]  John M. Hancock,et al.  K -Means Clustering. , 2010 .

[76]  Maozhen Li,et al.  HSim: A MapReduce simulator in enabling Cloud Computing , 2013, Future Gener. Comput. Syst..

[77]  Jason Helge Anderson,et al.  Reconfigurable network testbed for evaluation of datacenter topologies , 2014, DIDC '14.

[78]  Angela L. Chiu,et al.  Overview and Principles of Internet Traffic Engineering , 2002, RFC.

[79]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[80]  César A. F. De Rose,et al.  Scheduling MapReduce Jobs in HPC Clusters , 2012, Euro-Par.

[81]  Mohammad Hammoud,et al.  Center-of-Gravity Reduce Task Scheduling to Lower MapReduce Network Traffic , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[82]  Nathan Farrington,et al.  Facebook's data center network architecture , 2013, 2013 Optical Interconnects Conference.

[83]  Athanasios V. Vasilakos,et al.  Big data: From beginning to future , 2016, Int. J. Inf. Manag..

[84]  D. A. Pyke,et al.  Comparison of skewness coefficient, coefficient of variation, and Gini coefficient as inequality measures within populations , 1989, Oecologia.

[85]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[86]  Alex C. Snoeren,et al.  Topology Switching for Data Center Networks , 2011, Hot-ICE.

[87]  Xin Wu,et al.  DARD: Distributed Adaptive Routing for Datacenter Networks , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.