A taxonomy of application scheduling tools for high performance cluster computing

Application scheduling plays an important role in high-performance cluster computing. Application scheduling can be classified as job scheduling and task scheduling. This paper presents a survey on the software tools for the graph-based scheduling on cluster systems with the focus on task scheduling. The tasks of a parallel or distributed application can be properly scheduled onto multi-processors in order to optimize the performance of the program (e.g., execution time or resource utilization). In general, scheduling algorithms are designed based on the notion of task graph that represents the relationship of parallel tasks. The scheduling algorithms map the nodes of a graph to the processors in order to minimize overall execution time. Although many scheduling algorithms have been proposed in the literature, surprisingly not many practical tools can be found in practical use. After discussing the fundamental scheduling techniques, we propose a framework and taxonomy for the scheduling tools on clusters. Using this framework, the features of existing scheduling tools are analyzed and compared. We also discuss the important issues in improving the usability of the scheduling tools.

[1]  Jacques Periaux,et al.  Parallel Computational Fluid Dynamics 2005: Theory and Applications , 2006 .

[2]  Tao Yang,et al.  PYRROS: static task scheduling and code generation for message passing multiprocessors , 1992 .

[3]  Uwe Schwiegelshohn,et al.  Theory and Practice in Parallel Job Scheduling , 1997, JSSPP.

[4]  Larry Rudolph,et al.  Job Scheduling Strategies for Parallel Processing: IPPS/SPDP'99 Workshop, JSSPP'99, San Juan, Puerto Rico, April 16, 1999, Proceedings , 1999 .

[5]  Mark Baker,et al.  Cluster Computing White Paper , 2000, ArXiv.

[6]  Daniel Gajski,et al.  A Programming Aid for Message-passing Systems , 1987, PPSC.

[7]  Ishfaq Ahmad,et al.  On Exploiting Task Duplication in Parallel Program Scheduling , 1998, IEEE Trans. Parallel Distributed Syst..

[8]  Uwe Schwiegelshohn,et al.  On the Design and Evaluation of Job Scheduling Algorithms , 1999, JSSPP.

[9]  Rajkumar Buyya,et al.  High Performance Cluster Computing: Architectures and Systems , 1999 .

[10]  Mario A. Bochicchio,et al.  The use of PVM with workstation clusters for distributed SAR data processing , 1995, HPCN Europe.

[11]  Emmanuel Jeannot,et al.  Compact DAG Representation and Its Dynamic Scheduling , 1999, J. Parallel Distributed Comput..

[12]  Masato Oguchi,et al.  Parallel Database Processing on a 100 Node PC Cluster: Cases for Decision Support Query Processing and Data Mining , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[13]  Hong Shen,et al.  An Architecture-Independent Graphical Tool for Automatic Contention-Free Process-to-Processor Mapping , 2001, The Journal of Supercomputing.

[14]  Andrew S. Grimshaw,et al.  Portable run-time support for dynamic object-oriented parallel processing , 1996, TOCS.

[15]  Daniel Gajski,et al.  Hypertool: A Programming Aid for Message-Passing Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[16]  Jing-Jang Hwang,et al.  Multiprocessor scheduling with interprocessor communication delays , 1988 .

[17]  Dharma P. Agrawal,et al.  A fast and scalable scheduling algorithm for distributed memory systems , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[18]  Ibm Redbooks,et al.  Workload Management With Loadleveler , 2001 .

[19]  Bernd Freisleben,et al.  A comparative study of online scheduling algorithms for networks of workstations , 2000, Cluster Computing.

[20]  Hesham H. Ali,et al.  Task scheduling in parallel and distributed systems , 1994, Prentice Hall series in innovative technology.

[21]  Dharma P. Agrawal,et al.  A Task Duplication Based Scalable Scheduling Algorithm for Distributed Memory Systems , 1997, J. Parallel Distributed Comput..

[22]  Jack Dongarra,et al.  HeNCE: graphical development tools for network-based concurrent computing , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[23]  Alexander Reinefeld,et al.  MARS - A framework for minimizing the job execution time in a metacomputing environment , 1996, Future Gener. Comput. Syst..

[24]  Serge Vaudenay,et al.  Cluster Management Software , 2003 .

[25]  Rajkumar Buyya,et al.  High Performance Cluster Computing: Programming and Applications , 1999 .

[26]  Ana Cortés,et al.  Clustering and reassignment-based mapping strategy for message-passing architectures , 2003, J. Syst. Archit..

[27]  Jiannong Cao,et al.  Dynamic configuration management in a graph-oriented Distributed Programming Environment , 2003, Sci. Comput. Program..

[28]  Andrew S. Grimshaw,et al.  Wide-Area Computing: Resource Sharing on a Large Scale , 1999, Computer.

[29]  José M. Bernabéu-Aubán,et al.  Solaris MC: A Multi Computer OS , 1996, USENIX Annual Technical Conference.

[30]  Salim Hariri,et al.  The software architecture of a virtual distributed computing environment , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[31]  Andrew S. Grimshaw,et al.  Legion: An Operating System for Wide-Area Computing , 1999 .

[32]  Andrew S. Grimshaw,et al.  The Legion vision of a worldwide virtual computer , 1997, Commun. ACM.

[33]  Emmanuel Jeannot,et al.  Compact DAG representation and its symbolic scheduling , 1999, J. Parallel Distributed Comput..

[34]  KwokYu-Kwong,et al.  Benchmarking and Comparison of the Task Graph Scheduling Algorithms , 1999 .

[35]  Cho-Li Wang,et al.  Solving irregularly structured problems based on distributed object model , 2003, Parallel Comput..

[36]  Rajkumar Buyya,et al.  High Performance Cluster Computing , 1999 .

[37]  C. B. Jenssen Parallel computational fluid dynamics : trends and applications : proceedings of the Parallel CFD 2000 Conference , Trondheim, Norway (May 22-25, 2000) , 2001 .

[38]  Ishfaq Ahmad,et al.  CASCH: A Software Tool for Automatic Parallelization and Scheduling of Programs on Message-Passing Multiprocessors , 1999 .

[39]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[40]  Dharma P. Agrawal,et al.  A task duplication based scheduling algorithm for heterogeneous systems , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[41]  M. Langlois,et al.  Society of Photo-Optical Instrumentation Engineers , 2005 .

[42]  Bin Cong,et al.  Scalable Parallel Computing: Technology, Architecture, Programming , 1999, Scalable Comput. Pract. Exp..

[43]  Ishfaq Ahmad,et al.  Benchmarking and Comparison of the Task Graph Scheduling Algorithms , 1999, J. Parallel Distributed Comput..

[44]  Ana Cortés,et al.  Clustering and reassignment-based mapping strategy for message-passing architectures , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[45]  Ishfaq Ahmad,et al.  CASCH: a tool for computer-aided scheduling , 2000, IEEE Concurr..

[46]  Robert L. Stevenson,et al.  Cluster-based parallel image processing toolkit , 1995, Electronic Imaging.

[47]  Salim Hariri,et al.  The design and evaluation of a virtual distributed computing environment , 2004, Cluster Computing.

[48]  Jack J. Dongarra,et al.  HeNCE: A Heterogeneous Network Computing Environment , 1994, Sci. Program..

[49]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[50]  P. Altena,et al.  In search of clusters , 2007 .