Proportion Scheduler to Improve the Mismatched Locality in YARN

YARN is a prevailing central resource management architecture, which allocates a group of resources to each application. The resource group is consistent with locality requests of tasks in applications. But each application allocates the resources in the resource group to each task according to locality capacities of resources. Different rules in two levels of resource allocations lead to mismatched localities for tasks, which hurts performances of applications. There is a lack of researches about mismatched localities for tasks in YARN. This paper designs a Proportion scheduler to improve mismatched localities without dragging applications. The locality capacities of resources becomes unified allocation rule. Resources are classified by the locality requests of tasks, so that resources in the same category are versatile for tasks with the same level of locality request. This classification decreases the mismatched probability. In addition, the improvement of mismatched localities makes compromises between different applications. Every application is assigned with proportional resources in different locality scales for improved performances. Compared to baseline schedulers, there are 2 times data-local tasks and more than 30% rack-local tasks. The Proportion decreases makespan of applications by a maximum 66.7% and network traffic by an utmost 80%.

[1]  Changjun Jiang,et al.  Cross-Platform Resource Scheduling for Spark and MapReduce on YARN , 2017, IEEE Transactions on Computers.

[2]  Yi Yao,et al.  HaSTE: Hadoop YARN Scheduling Based on Task-Dependency and Resource-Demand , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[3]  M. Divya,et al.  Workload characteristics and resource aware Hadoop scheduler , 2015, 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS).

[4]  Winfried K. Grassmann,et al.  Simulation and Performance Evaluation of Hadoop Capacity Scheduler , 2013 .

[5]  Qian Chen,et al.  Millipedes: Distributed and Set-Based Sub-Task Scheduler of Computing Engines Running on Yarn Cluster , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.

[6]  Feng Li,et al.  SLA-aware energy-efficient scheduling scheme for Hadoop YARN , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.

[7]  Ching-Hsien Hsu,et al.  Locality and loading aware virtual machine mapping techniques for optimizing communications in MapReduce applications , 2015, Future Gener. Comput. Syst..

[8]  Ling Liu,et al.  Purlieus: Locality-aware resource allocation for MapReduce in a cloud , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[9]  Lei Yu,et al.  Probabilistic Network-Aware Task Placement for MapReduce Scheduling , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[10]  Tevfik Kosar,et al.  Locality and Network-Aware Reduce Task Scheduling for Data-Intensive Applications , 2014, 2014 5th International Workshop on Data-Intensive Computing in the Clouds.

[11]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[12]  Beomseok Nam,et al.  Coalescing HDFS Blocks to Avoid Recurring YARN Container Overhead , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[13]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[14]  Franck Le,et al.  Phurti: Application and Network-Aware Flow Scheduling for Multi-tenant MapReduce Clusters , 2016, 2016 IEEE International Conference on Cloud Engineering (IC2E).

[15]  Yanling Shao,et al.  Energy-Aware Dynamic Resource Allocation on Hadoop YARN Cluster , 2016, 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).