GB-PANDAS:: Throughput and heavy-traffic optimality analysis for affinity scheduling

Dynamic affinity scheduling has been an open problem for nearly three decades. The problem is to dynamically schedule multi-type tasks to multi-skilled servers such that the resulting queueing system is both stable in the capacity region (throughput optimality) and the mean delay of tasks is minimized at high loads near the boundary of the capacity region (heavy-traffic optimality). As for applications, dataintensive analytics like MapReduce, Hadoop, and Dryad fit into this setting, where the set of servers is heterogeneous for different task types, so the pair of task type and server determines the processing rate of the task. The load balancing algorithm used in such frameworks is an example of affinity scheduling which is desired to be both robust and delay optimal at high loads when hot-spots occur. Fluid model planning, the MaxWeight algorithm, and the generalized c?-rule are among the first algorithms proposed for affinity scheduling that have theoretical guarantees on being optimal in different senses, which will be discussed in the related work section. All these algorithms are not practical for use in data center applications because of their non-realistic assumptions. The join-the-shortest-queue-MaxWeight (JSQMaxWeight), JSQ-Priority, and weighted-workload algorithms are examples of load balancing policies for systems with two and three levels of data locality with a rack structure. In this work, we propose the Generalized-Balanced-Pandas algorithm (GB-PANDAS) for a system with multiple levels of data locality and prove its throughput optimality. We prove this result under an arbitrary distribution for service times, whereas most previous theoretical work assumes geometric distribution for service times. The extensive simulation results show that the GB-PANDAS algorithm alleviates the mean delay and has a better performance than the JSQMaxWeight algorithm by up to twofold at high loads. We believe that the GB-PANDAS algorithm is heavy-traffic optimal in a larger region than JSQ-MaxWeight, which is an interesting problem for future work.

[1]  Cauligi S. Raghavendra,et al.  DCRoute: Speeding up Inter-Datacenter Traffic Allocation while Guaranteeing Deadlines , 2016, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC).

[2]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[3]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[4]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[5]  Qiaomin Xie Scheduling and resource allocation for clouds: novel algorithms, state space collapse and decay of tails , 2016 .

[6]  Srikanth Kandula,et al.  DCCast: Efficient Point to Multipoint Transfers Across Datacenters , 2017, HotCloud.

[7]  Yi Lu,et al.  Priority algorithm for near-data scheduling: Throughput and heavy-traffic optimality , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[8]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[9]  Geppino Pucci,et al.  Universality in VLSI Computation , 2011, ParCo 2011.

[10]  Jordi Torres,et al.  Resource-Aware Adaptive Scheduling for MapReduce Clusters , 2011, Middleware.

[11]  Murali S. Kodialam,et al.  Joint scheduling of processing and Shuffle phases in MapReduce systems , 2012, 2012 Proceedings IEEE INFOCOM.

[12]  Fang Dong,et al.  BAR: An Efficient Data Locality Driven Task Scheduling Algorithm for Cloud Computing , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[13]  James R. Larus,et al.  Join-Idle-Queue: A novel load balancing algorithm for dynamically scalable web services , 2011, Perform. Evaluation.

[14]  Ronald J. Williams,et al.  Dynamic scheduling of a system with two parallel servers in heavy traffic with resource pooling: asymptotic optimality of a threshold policy , 2001 .

[15]  Ali Yekkehkhany Near Data Scheduling for Data Centers with Multi Levels of Data Locality , 2017, ArXiv.

[16]  A. Stolyar MaxWeight scheduling in a generalized switch: State space collapse and workload minimization in heavy traffic , 2004 .

[17]  J. Michael Harrison,et al.  Heavy traffic resource pooling in parallel‐server systems , 1999, Queueing Syst. Theory Appl..

[18]  Rajeev Gandhi,et al.  An Analysis of Traces from a Production MapReduce Cluster , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[19]  Yi Lu,et al.  Scheduling with multi-level data locality: Throughput and heavy-traffic optimality , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[20]  Sem C. Borst,et al.  Universality of Power-of-d Load Balancing Schemes , 2016, PERV.

[21]  Cauligi S. Raghavendra,et al.  DCRoute: Speeding up Inter-Datacenter Traffic Allocation while Guaranteeing Deadlines , 2016, HiPC.

[22]  David Padua,et al.  Encyclopedia of Parallel Computing , 2011 .

[23]  Minghong Lin,et al.  Joint optimization of overlapping phases in MapReduce , 2013, Perform. Evaluation.

[24]  Lei Ying,et al.  Communication Networks - An Optimization, Control, and Stochastic Networks Perspective , 2014 .

[25]  Mark S. Squillante,et al.  On Optimal Weighted-Delay Scheduling in Input-Queued Switches , 2017 .

[26]  Xiaoqiao Meng,et al.  Coupling task progress for MapReduce resource-aware scheduling , 2013, 2013 Proceedings IEEE INFOCOM.

[27]  Hai Jin,et al.  Maestro: Replica-Aware Map Scheduling for MapReduce , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[28]  Alexander L. Stolyar,et al.  Scheduling Flexible Servers with Convex Delay Costs: Heavy-Traffic Optimality of the Generalized cµ-Rule , 2004, Oper. Res..

[29]  GhemawatSanjay,et al.  The Google file system , 2003 .

[30]  Yanpei Chen,et al.  Energy efficiency for large-scale MapReduce workloads with significant interactive analysis , 2012, EuroSys '12.

[31]  S. L. Bell,et al.  Dynamic Scheduling of a Parallel Server System in Heavy Traffic with Complete Resource Pooling: Asymptotic Optimality of a Threshold Policy , 2005 .

[32]  Albert G. Greenberg,et al.  Scarlett: coping with skewed content popularity in mapreduce clusters , 2011, EuroSys '11.

[33]  Cristina L. Abad,et al.  Pandas: Robust Locality-Aware Scheduling With Stochastic Delay Optimality , 2017, IEEE/ACM Transactions on Networking.

[34]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[35]  R. J. Williams,et al.  Dynamic scheduling of a system with two parallel servers: asymptotic policy in heavy traffic , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).

[36]  Chen He,et al.  Matchmaking: A New MapReduce Scheduling Technique , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[37]  Lei Ying,et al.  MapTask Scheduling in MapReduce With Data Locality: Throughput and Heavy-Traffic Optimality , 2013, IEEE/ACM Transactions on Networking.

[38]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[39]  Cristina L. Abad,et al.  DARE: Adaptive Data Replication for Efficient Cluster Scheduling , 2011, 2011 IEEE International Conference on Cluster Computing.

[40]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[41]  J. Harrison Heavy traffic analysis of a system with parallel servers: asymptotic optimality of discrete-review policies , 1998 .

[42]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[43]  Srikanth Kandula,et al.  PACMan: Coordinated Memory Caching for Parallel Jobs , 2012, NSDI.