ScalScheduling: A Scalable Scheduling Architecture for MPI-based interactive analysis programs

In today's large scale clusters, running tasks with high degrees of parallelism allows interactive data visualization/analysis to complete in seconds. However, conventional, centralized scheduling poses significant challenges for these interactive applications. As the amount of data to be processed grows, it becomes too heavy to move across the network. Thus, data processing tasks should be scheduled such that the amount of transferred data is minimized, i.e., realizing data locality computation. To implement this, a scheduler process should collect and analyze data distribution metadata prior to making scheduling decisions, which usually causes milliseconds or seconds of latency. Such scheduling delay is unacceptable for interactive data applications. In this paper, we present a Scalable Scheduling Architecture for conventional interactive data programs and refer to it as ScalScheduling. ScalScheduling is proposed to reduce task scheduling latency, while ensuring the worker processes achieve a high degree of data locality computation and load balance in heterogeneous environments. In our proposed architecture, each worker process uses a novel Modulo-based priority method to schedule its local tasks independently. Multiple scheduler processes are employed according to the number of worker processes to resolve the issue of concurrent requests and assign remote tasks with respect to load balance. We perform experiments using thousands of parallel processes, and the experimental results show the benefits of our proposed scheduling architecture as well as its potential for future oversize task scheduling problems on large-scale clusters.

[1]  L.M. Ni,et al.  Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers , 1993, IEEE Trans. Parallel Distributed Syst..

[2]  Mats Brorsson,et al.  Resource management for task-based parallel programs over a multi-kernel. : BIAS: Barrelfish Inter-core Adaptive Scheduling , 2012 .

[3]  Miron Livny,et al.  Distributed computing in practice: the Condor experience: Research Articles , 2005 .

[4]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[5]  Edith Schonberg,et al.  Factoring: a method for scheduling parallel loops , 1992 .

[6]  Theodore Andronikos,et al.  Self-Adapting Scheduling for Tasks with Dependencies in Stochastic Environments , 2006, 2006 IEEE International Conference on Cluster Computing.

[7]  Garth A. Gibson,et al.  PRObE: A Thousand-Node Experimental Cluster for Computer Systems Research , 2013, login Usenix Mag..

[8]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[9]  Ioana Banicescu,et al.  On the Scalability of Dynamic Scheduling Scientific Applications with Adaptive Weighted Factoring , 2003, Cluster Computing.

[10]  Jun Wang,et al.  DL-MPI: Enabling data locality computation for MPI-based data-intensive applications , 2013, 2013 IEEE International Conference on Big Data.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[13]  Jun Wang,et al.  Supporting HPC Analytics Applications with Access Patterns Using Data Restructuring and Data-Centric Scheduling Techniques in MapReduce , 2013, IEEE Transactions on Parallel and Distributed Systems.

[14]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[15]  Nagiza F. Samatova,et al.  Coordinating Computation and I/O in Massively Parallel Sequence Search , 2011, IEEE Transactions on Parallel and Distributed Systems.

[16]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[17]  Garrick Staples,et al.  TORQUE resource manager , 2006, SC.

[18]  Jun Wang,et al.  VisIO: Enabling Interactive Visualization of Ultra-Scale, Time Series Data via High-Bandwidth Distributed I/O Systems , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[19]  Andrey Gubarev,et al.  Dremel : Interactive Analysis of Web-Scale Datasets , 2011 .

[20]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[21]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[22]  C. C. Law,et al.  ParaView: An End-User Tool for Large-Data Visualization , 2005, The Visualization Handbook.

[23]  Anthony T. Chronopoulos,et al.  Scalable loop self-scheduling schemes for heterogeneous clusters , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[24]  Changjun Wu,et al.  An efficient parallel approach for identifying protein families in large-scale metagenomic data sets , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.