Processor allocation on Cplant: achieving general processor locality using one-dimensional allocation strategies

The Computational Plant or Cplant is a commodity-based supercomputer under development at Sandia National Laboratories. This paper describes resource-allocation strategies to achieve processor locality for parallel jobs in Cplant and other supercomputers. Users of Cplant and other Sandia supercomputers submit parallel jobs to a job queue. When a job is scheduled to run, it is assigned to a set of processors. To obtain maximum throughput, jobs should be allocated to localized clusters of processors to minimize communication costs and to avoid bandwidth contention caused by overlapping jobs. This paper introduces new allocation strategies and performance metrics based on space-filling curves and one dimensional allocation strategies. These algorithms are general and simple. Preliminary simulations and Cplant experiments indicate that both space-filling curves and one-dimensional packing improve processor locality compared to the sorted free list strategy previously used on Cplant. These new allocation strategies are implemented in the new release of the Cplant System Software, Version 2.0, phased into the Cplant systems at Sandia by May 2002.

[1]  Prabhakar Raghavan,et al.  Multidimensional on-line bin packing: Algorithms and worst-case analysis , 1989 .

[2]  Eric W. Weisstein,et al.  Eric Weisstein''s World of Mathematics , 1999, WWW 1999.

[3]  Ronald B. Brightwell,et al.  Scalability limitations of VIA-based technologies in supporting MPI , 2000 .

[4]  Jing Wu,et al.  A locality-preserving cache-oblivious dynamic dictionary , 2002, SODA '02.

[5]  Michael Lindenbaum,et al.  On the metric properties of discrete space-filling curves , 1996, IEEE Trans. Image Process..

[6]  John Michael Robson,et al.  An Estimate of the Store Size Necessary for Dynamic Storage Allocation , 1971, JACM.

[7]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[8]  John Michael Robson,et al.  Bounds for Some Functions Concerning Dynamic Storage Allocation , 1974, JACM.

[9]  Mithuna Thottethodi,et al.  Recursive Array Layouts and Fast Matrix Multiplication , 2002, IEEE Trans. Parallel Distributed Syst..

[10]  Robert E. Tarjan,et al.  Amortized efficiency of list update and paging rules , 1985, CACM.

[11]  Rob van Stee,et al.  New Bounds for Multidimensional Packing , 2003, Algorithmica.

[12]  Michael A. Bender,et al.  Cache-oblivious B-trees , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[13]  Gerth Stølting Brodal,et al.  Cache oblivious search trees via binary trees of small height , 2001, SODA '02.

[14]  Gerhard J. Woeginger,et al.  On-line Packing and Covering Problems , 1996, Online Algorithms.

[15]  Joseph Naor,et al.  Tight bounds for dynamic storage allocation , 1994, SODA '94.

[16]  G. Peano Sur une courbe, qui remplit toute une aire plane , 1890 .

[17]  R. Brightwell,et al.  A System Software Architecture for High End Computing , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[18]  David S. Greenberg,et al.  Massively parallel computing using commodity components , 2000, Parallel Comput..

[19]  Sei-ichiro Kamata,et al.  A new algorithm for N-dimensional Hilbert scanning , 1999, IEEE Trans. Image Process..

[20]  Claire Mathieu,et al.  A Self Organizing Bin Packing Heuristic , 1999, ALENEX.

[21]  David S. Johnson,et al.  Near-optimal bin packing algorithms , 1973 .

[22]  Jirí Sgall,et al.  On-line Scheduling , 1996, Online Algorithms.

[23]  Jeremy D. Frens,et al.  Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code , 1997, PPOPP '97.

[24]  Michael Griebel,et al.  Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelizati , 1998 .

[25]  Edward G. Coffman,et al.  Approximation algorithms for bin packing: a survey , 1996 .

[26]  Richard Cole,et al.  Optimised Predecessor Data Structures for Internal Memory , 2001, WAE.

[27]  Christos Faloutsos,et al.  Analysis of the Clustering Properties of the Hilbert Space-Filling Curve , 2001, IEEE Trans. Knowl. Data Eng..

[28]  S. Chatterjee,et al.  Fractal scanning for image compression , 1991, [1991] Conference Record of the Twenty-Fifth Asilomar Conference on Signals, Systems & Computers.

[29]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.

[30]  Srinivas Aluru,et al.  Parallel domain decomposition and load balancing using space-filling curves , 1997, Proceedings Fourth International Conference on High-Performance Computing.

[31]  P. Sadayappan,et al.  Selective buddy allocation for scheduling parallel jobs on clusters , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[32]  Jeffrey D. Ullman,et al.  Worst-Case Performance Bounds for Simple One-Dimensional Packing Algorithms , 1974, SIAM J. Comput..

[33]  Rolf Niedermeier,et al.  Towards optimal locality in mesh-indexings , 1997, Discret. Appl. Math..

[34]  Yossi Matias,et al.  A Video Scrambling Technique Based On Space Filling Curves , 1987, CRYPTO.

[35]  D. S. Johnson,et al.  On Packing Two-Dimensional Bins , 1982 .

[36]  Rob van Stee,et al.  New bounds for multi-dimensional packing , 2002, SODA '02.

[37]  D. Hilbert Über die stetige Abbildung einer Linie auf ein Flächenstück , 1935 .

[38]  Sei-ichiro Kamata,et al.  An Implementation of the Hilbert Scanning Algorithm and Its Application to Data Compression (Special Issue on Image Processing and Understanding) , 1993 .

[39]  Scott B. Baden,et al.  Dynamic Partitioning of Non-Uniform Structured Workloads with Spacefilling Curves , 1996, IEEE Trans. Parallel Distributed Syst..

[40]  David S. Johnson,et al.  Fast Algorithms for Bin Packing , 1974, J. Comput. Syst. Sci..