Network-aware resource management for scalable data analytics frameworks

Sharing cluster resources between multiple frameworks, applications and datasets is important for organizations doing large scale data analytics. It improves cluster utilization, avoids standalone clusters running only a single framework and allows data scientists to choose the best framework for each analysis task. Current systems for cluster resource management like YARN or Mesos achieve resource sharing using containers. Analytics frameworks execute their tasks in these containers. However, currently the container placement is based predominantly on available computing capabilities in terms of cores and memory, yet neglects to also take the network topology and data locations into account. In this paper, we propose a container placement approach that (a) takes the network topology into account to prevent network congestions in the core network and (b) places containers close to input data to improve data locality and reduce remote disk reads in distributed file systems. The main advantages of introducing topology- and data-awareness on the level of container placement is that multiple application frameworks benefit from improvements. We present a prototype integrated with Hadoop YARN and an evaluation with workloads consisting of different applications and datasets using Apache Flink. Our evaluation on a 64 core cluster, in which nodes are connected through a fat tree topology, shows promising results with speedups of up to 67% for network-intensive workloads.

[1]  Cheng-Zhong Xu,et al.  Interference and locality-aware task scheduling for MapReduce applications in virtual clusters , 2013, HPDC.

[2]  Michael I. Jordan,et al.  Managing data transfers in computer clusters with orchestra , 2011, SIGCOMM.

[3]  Ion Stoica,et al.  The Power of Choice in Data-Aware Cluster Scheduling , 2014, OSDI.

[4]  Wenfei Fan,et al.  Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data , 2014 .

[5]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[6]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[7]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[8]  David E. Culler,et al.  Hierarchical scheduling for diverse datacenter workloads , 2013, SoCC.

[9]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[10]  Volker Markl,et al.  Spinning Fast Iterative Data Flows , 2012, Proc. VLDB Endow..

[11]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[12]  Michael Kaminsky,et al.  Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles , 2013, SOSP 2013.

[13]  Albert G. Greenberg,et al.  Scarlett: coping with skewed content popularity in mapreduce clusters , 2011, EuroSys '11.

[14]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[15]  Anupam Das,et al.  Transparent and Flexible Network Management for Big Data Processing in the Cloud , 2013, HotCloud.

[16]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[17]  Felix Naumann,et al.  The Stratosphere platform for big data analytics , 2014, The VLDB Journal.

[18]  Mung Chiang,et al.  Multiresource allocation: fairness-efficiency tradeoffs in a unifying framework , 2013, TNET.

[19]  Keith Kirkpatrick,et al.  Software-defined networking , 2013, CACM.

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[21]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[22]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[23]  Scott Shenker,et al.  Making Sense of Performance in Data Analytics Frameworks , 2015, NSDI.

[24]  Kostas Katrinis,et al.  Pythia: Faster Big Data in Motion through Predictive Software-Defined Network Optimization at Runtime , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[25]  Dominic Battré,et al.  Nephele/PACTs: a programming model and execution framework for web-scale analytical processing , 2010, SoCC '10.

[26]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[27]  Stefan Schmid,et al.  Kraken: Towards Elastic Performance Guarantees in Multi-tenant Data Centers , 2015, SIGMETRICS.

[28]  Luke M. Leslie,et al.  Cross-Layer Scheduling in Cloud Systems , 2015, 2015 IEEE International Conference on Cloud Engineering.

[29]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[30]  Hitesh Ballani,et al.  Towards predictable datacenter networks , 2011, SIGCOMM 2011.

[31]  Wei Lin,et al.  Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing , 2014, OSDI.

[32]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[33]  Michael Abd-El-Malek,et al.  Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.

[34]  Matei Zaharia,et al.  Job Scheduling for Multi-User MapReduce Clusters , 2009 .

[35]  Diksha Verma,et al.  Quincy: Fair Scheduling for Distributed Computing Clusters , 2014 .

[36]  Benjamin Hindman,et al.  Dominant Resource Fairness: Fair Allocation of Multiple Resource Types , 2011, NSDI.

[37]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[38]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[39]  Alekh Jindal,et al.  Hadoop++ , 2010 .

[40]  Carlo Curino,et al.  Reservation-based Scheduling: If You're Late Don't Blame Us! , 2014, SoCC.