Application-aware management of parallel simulation collections

This paper presents a system deployed on parallel clusters to manage a collection of parallel simulations that make up a computational study. It explores how such a system can extend traditional parallel job scheduling and resource allocation techniques to incorporate knowledge specific to the study. Using a UINTAH-based helium gas simulation code (ARCHES) and the SimX system for multi-experiment computational studies, this paper demonstrates that, by using application-specific knowledge in resource allocation and scheduling decisions, one can reduce the run time of a computational study from over 20 hours to under 4.5 hours on a 32-processor cluster, and from almost 11 hours to just over 3.5 hours on a 64-processor cluster.

[1]  Michelle Miller,et al.  An integrated problem solving environment: the SCIRun computational steering system , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[2]  Steven G. Parker,et al.  Uintah: a massively parallel problem solving environment , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[3]  Jarek Nabrzyski,et al.  Grid resource management: state of the art and future trends , 2004 .

[4]  P. Sadayappan,et al.  Effective Selection of Partition Sizes for Moldable Scheduling of Parallel Jobs , 2002, HiPC.

[5]  Eitan Grinspun,et al.  Sim-X: parallel system software for interactive multi-experiment computational studies , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[6]  Erik K. Antonsson,et al.  PRELIMINARY VEHICLE STRUCTURE DESIGN: AN INDUSTRIAL APPLICATION OF IMPRECISION IN ENGINEERING DESIGN , 1998 .

[7]  David Abramson,et al.  Nimrod: a tool for performing parametrised simulations using distributed workstations , 1995, Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing.

[8]  Matthias Gries,et al.  Methods for evaluating and covering the design space during early design development , 2004, Integr..

[9]  Achille Messac,et al.  Physical programming - Effective optimization for computational design , 1996 .

[10]  Steven G. Parker,et al.  Result reuse in design space exploration: A study in system support for interactive parallel computing , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[11]  Eitan Grinspun,et al.  SimX meets SCIRun: A Component-based Implementation of a Computational Study System , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[12]  Dror G. Feitelson,et al.  Job Scheduling in Multiprogrammed Parallel Systems , 1997 .

[13]  P. Sadayappan,et al.  Selective buddy allocation for scheduling parallel jobs on clusters , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[14]  Satoshi Fujita,et al.  Approximation algorithms for multiprocessor scheduling problem , 2000 .

[15]  D. Abramson,et al.  An Automatic Design Optimization Tool and its Application to Computational Fluid Dynamics , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[16]  Christopher R. Johnson,et al.  DefibSim: an interactive defibrillation device design tool , 1995, Proceedings of 17th International Conference of the Engineering in Medicine and Biology Society.

[17]  C.R. Johnson,et al.  SCIRun: A Scientific Programming Environment for Computational Steering , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[18]  P. Sadayappan,et al.  A Robust Scheduling Strategy for Moldable Scheduling of Parallel Jobs. , 2003 .

[19]  Philip J. Smith,et al.  Heat Transfer To Objects In Pool Fires , 2008 .

[20]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[21]  Francine Berman,et al.  Resource Allocation for Steerable Parallel Parameter Searches , 2002, GRID.

[22]  Henri Casanova,et al.  Netsolve: a Network-Enabled Server for Solving Computational Science Problems , 1997, Int. J. High Perform. Comput. Appl..

[23]  Shikharesh Majumdar,et al.  Scheduling in multiprogrammed parallel systems , 1988, SIGMETRICS 1988.

[24]  Sheldon R. Tieszen,et al.  Large eddy simulation and experimental measurements of the near-field of a large turbulent helium plume , 2004 .

[25]  T. Simpson,et al.  Efficient Pareto Frontier Exploration using Surrogate Approximations , 2000 .