Ensemble Toolkit: Scalable and Flexible Execution of Ensembles of Tasks

There are many science applications that require scalable task-level parallelism, support for flexible execution and coupling of ensembles of simulations. Most high-performance system software and middleware, however, are designed to support the execution and optimization of single tasks. Motivated by the missing capabilities of these computing systems and the increasing importance of task-level parallelism, we introduce the Ensemble toolkit which has the following application development features: (i) abstractions that enable the expression of ensembles as primary entities, and (ii) support for ensemble-based execution patterns that capture the majority of application scenarios. Ensemble toolkit uses a scalable pilot-based runtime system that decouples workload execution and resource management details from the expression of the application, and enables the efficient and dynamic execution of ensembles on heterogeneous computing resources. We investigate three execution patterns and characterize the scalability and overhead of Ensemble toolkit for these patterns. We investigate scaling properties for up to O(1000)concurrent ensembles and O(1000) cores and find linear weak and strong scaling behaviour.

[1]  Daniel R. Roe,et al.  The Impact of Heterogeneous Computing on Workflows for Biomolecular Simulation and Analysis , 2015, Computing in Science & Engineering.

[2]  Grant M. Rotskoff,et al.  Molecular simulation workflows as parallel algorithms: the execution engine of Copernicus, a distributed high-performance computing platform. , 2015, Journal of chemical theory and computation.

[3]  Andreas D. Lattner,et al.  Ensemble modeling of transport and dispersion simulations guided by machine learning hypotheses generation , 2012, Comput. Geosci..

[4]  Wil M.P. van der Aalst,et al.  YAWL: yet another workflow language , 2005, Inf. Syst..

[5]  Daniel S. Katz,et al.  Swift: A language for distributed parallel scripting , 2011, Parallel Comput..

[6]  Aixue Hu,et al.  Uncertainty in future regional sea level rise due to internal climate variability , 2013 .

[7]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[8]  Shantenu Jha,et al.  High-level software frameworks to surmount the challenge of 100x scaling for biomolecular simulation science , 2015 .

[9]  Vivien Mallet,et al.  Ozone ensemble forecast with machine learning algorithms , 2009 .

[10]  Shantenu Jha,et al.  RADICAL-Pilot: Scalable Execution of Heterogeneous and Dynamic Workloads on Supercomputers , 2015, ArXiv.

[11]  Hui Wan,et al.  Short ensembles: an efficient method for discerning climate-relevant sensitivities in atmospheric general circulation models , 2014 .

[12]  Peter V. Coveney,et al.  Accurate Ensemble Molecular Dynamics Binding Free Energy Ranking of Multidrug-Resistant HIV-1 Proteases , 2010, J. Chem. Inf. Model..

[13]  Shantenu Jha,et al.  Comparative analysis of nucleotide translocation through protein nanopores using steered molecular dynamics and an adaptive biasing force , 2014, J. Comput. Chem..

[14]  Shantenu Jha,et al.  ExTASY: Scalable and flexible coupling of MD simulations and advanced sampling techniques , 2016, 2016 IEEE 12th International Conference on e-Science (e-Science).

[15]  Robert Allan,et al.  GROWL: A Lightweight Grid Services Toolkit and Applications , 2005 .

[16]  Leo Goodstadt,et al.  Ruffus: a lightweight Python library for computational pipelines , 2010, Bioinform..

[17]  David Abramson,et al.  High performance parametric modeling with Nimrod/G: killer application for the global grid? , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[18]  R Bellasio,et al.  Forecasting the consequences of accidental releases of radionuclides in the atmosphere from ensemble dispersion modelling. , 2001, Journal of environmental radioactivity.

[19]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[20]  John Shalf,et al.  SAGA: A Simple API for Grid Applications. High-level application programming on the Grid , 2006 .

[21]  C. Svaneborg Large-scale Atomic/Molecular Massively Parallel Simulator , 2011 .

[22]  Peter A. Kollman,et al.  AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules , 1995 .

[23]  Gerrit Groenhof,et al.  GROMACS: Fast, flexible, and free , 2005, J. Comput. Chem..

[24]  Feng Liu,et al.  Integrating Abstractions to Enhance the Execution of Distributed Applications , 2015, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[25]  Peter J. Tonellato,et al.  COSMOS: Python library for massively parallel workflows , 2014, Bioinform..

[26]  Y. Sugita,et al.  Replica-exchange molecular dynamics method for protein folding , 1999 .

[27]  Jianpeng Ma,et al.  CHARMM: The biomolecular simulation program , 2009, J. Comput. Chem..

[28]  J. Preto,et al.  Fast recovery of free energy landscapes via diffusion-map-directed molecular dynamics. , 2014, Physical chemistry chemical physics : PCCP.

[29]  Devarshi Ghoshal,et al.  Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[30]  Yolanda Gil,et al.  Pegasus: Mapping Scientific Workflows onto the Grid , 2004, European Across Grids Conference.

[31]  Peter Bauer,et al.  The quiet revolution of numerical weather prediction , 2015, Nature.

[32]  Giovanni Bussi,et al.  Enhanced Sampling in Molecular Dynamics Using Metadynamics, Replica-Exchange, and Temperature-Acceleration , 2013, Entropy.

[33]  Shantenu Jha,et al.  SAGA: A standardized access layer to heterogeneous Distributed Computing Infrastructure , 2015 .

[34]  Shantenu Jha,et al.  P∗: A model of pilot-abstractions , 2012, 2012 IEEE 8th International Conference on E-Science.

[35]  Carl Tape,et al.  Finite‐frequency tomography using adjoint methods—Methodology and examples using membrane surface waves , 2007 .

[36]  Shantenu Jha,et al.  Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics , 2014, BioMed research international.

[37]  Charles A Laughton,et al.  COCO: A simple tool to enrich the representation of conformational variability in NMR structures , 2009, Proteins.

[38]  M. Sofiev,et al.  Ensemble dispersion forecasting—Part I: concept, approach and indicators , 2004 .

[39]  Matthew Rodell,et al.  Water Balance in the Amazon Basin from a Land Surface Model Ensemble , 2014 .

[40]  Guido Cervone,et al.  Risk assessment of atmospheric emissions using machine learning , 2008 .

[41]  Shantenu Jha,et al.  RepEx: A Flexible Framework for Scalable Replica Exchange Molecular Dynamics Simulations , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[42]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[43]  David M. Schultz,et al.  FORECASTER'S FORUM Toward Improved Prediction: High-Resolution and Ensemble Modeling Systems in Operations , 2004 .

[44]  Edwin Sirko,et al.  Initial Conditions to Cosmological N-Body Simulations, or, How to Run an Ensemble of Simulations , 2005, astro-ph/0503106.

[45]  Sven Rahmann,et al.  Genome analysis , 2022 .

[46]  G. Sherlock,et al.  Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads , 2010, BMC Genomics.