Effects of Job and Task Placement on Parallel Scientific Applications Performance

This paper studies the influence that task placement may have on the performance of applications, mainly due to the relationship between communication locality and overhead. This impact is studied for torus and fat-tree topologies. A simulation-based performance study is carried out, using traces of applications and application kernels, to measure the time taken to complete one or several concurrent instances of a given workload. As the purpose of the paper is not to offer a miraculous task placement strategy, but to measure the impact that placement have on performance, we selected simple strategies, including random placement. The quantitative results of these experiments show that different workloads present different degrees of responsiveness to placement. Furthermore, both the number of concurrent parallel jobs sharing a machine and the size of its network has a clear impact on the time to complete a given workload. We conclude that the efficient exploitation of a parallel computer requires the utilization of scheduling policies aware of application behavior and network topology.

[1]  M. Jette,et al.  Simple Linux Utility for Resource Management , 2009 .

[2]  Cruz Izu,et al.  The Adaptive Bubble Router , 2001, J. Parallel Distributed Comput..

[3]  Javier Navaridas,et al.  Interconnection Network Simulation Using Traces of MPI Applications , 2009, International Journal of Parallel Programming.

[4]  James Patton Jones,et al.  Scheduling for Parallel Supercomputing: A Historical Perspective of Achievable Utilization , 1999, JSSPP.

[5]  Yi Liu,et al.  Allocating Tasks in Multi-core Processor based Parallel System , 2007, 2007 IFIP International Conference on Network and Parallel Computing Workshops (NPC 2007).

[6]  Ibm Redbooks,et al.  Workload Management With Loadleveler , 2001 .

[7]  Javier Navaridas,et al.  Reducing complexity in tree-like computer interconnection networks , 2010, Parallel Comput..

[8]  Bill Nitzberg,et al.  Noncontiguous Processor Allocation Algorithms for Mesh-Connected Multicomputers , 1997, IEEE Trans. Parallel Distributed Syst..

[9]  Wu-chun Feng,et al.  The Quadrics Network: High-Performance Clustering Technology , 2002, IEEE Micro.

[10]  Hans Werner Meuer,et al.  Top500 Supercomputer Sites , 1997 .

[11]  José Miguel-Alonso,et al.  INSEE: An Interconnection Network Simulation and Evaluation Environment , 2005, Euro-Par.

[12]  P. Sadayappan,et al.  Selective buddy allocation for scheduling parallel jobs on clusters , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[13]  José E. Moreira,et al.  Resource allocation and utilization in the Blue Gene/L supercomputer , 2005, IBM J. Res. Dev..

[14]  Javier Navaridas,et al.  Realistic Evaluation of Interconnection Network Performance at High Loads , 2007, Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT 2007).

[15]  Javier Navaridas,et al.  On synthesizing workloads emulating MPI applications , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[16]  Hong Shen,et al.  Privacy Preserving Set Intersection Protocol Secure against Malicious Behaviors , 2007 .

[17]  D. Doreen Hephzibah Miriam,et al.  An Efficient SRA Based Isomorphic Task Allocation Scheme for k - ary n - cube Massively Parallel Processors , 2006, International Symposium on Parallel Computing in Electrical Engineering (PARELEC'06).