Co-scheduling Ensembles of In Situ Workflows

Molecular dynamics (MD) simulations are widely used to study large-scale molecular systems. HPC systems are ideal platforms to run these studies, however, reaching the necessary simulation timescale to detect rare processes is challenging, even with modern supercomputers. To overcome the timescale limitation, the simulation of a long MD trajectory is replaced by multiple short-range simulations that are executed simultaneously in an ensemble of simulations. Analyses are usually co-scheduled with these simulations to efficiently process large volumes of data generated by the simulations at runtime, thanks to in situ techniques. Executing a workflow ensemble of simulations and their in situ analyses requires efficient co-scheduling strategies and sophisticated management of computational resources so that they are not slowing down each other. In this paper, we propose an efficient method to co-schedule simulations and in situ analyses such that the makespan of the workflow ensemble is minimized. We present a novel approach to allocate resources for a workflow ensemble under resource constraints by using a theoretical framework modeling the workflow ensemble’s execution. We evaluate the proposed approach using an accurate simulator based on the WRENCH simulation framework on various workflow ensemble configurations. Results demonstrate the significance of co-scheduling simulations and in situ analyses that couple data together to benefit from data locality, in which inefficient scheduling decisions can lead to slowdown in makespan up to a factor of 30.

[1]  S. A. Jacobs,et al.  Enabling machine learning-ready HPC ensembles with Merlin , 2019, Future Gener. Comput. Syst..

[2]  Trilce Estrada,et al.  A lightweight method for evaluating in situ workflow efficiency , 2020, J. Comput. Sci..

[3]  Henri Casanova,et al.  Developing accurate and scalable simulators of production workflow management systems with WRENCH , 2020, Future Gener. Comput. Syst..

[4]  Shantenu Jha,et al.  Adaptive Ensemble Biomolecular Applications at Scale , 2018, SN Computer Science.

[5]  Daniel Mossé,et al.  Intelligent Colocation of Workloads for Enhanced Server Efficiency , 2019, 2019 31st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[6]  Trilce Estrada,et al.  Characterizing In Situ and In Transit Analytics of Molecular Dynamics Simulations for Next-Generation Supercomputers , 2019, 2019 15th International Conference on eScience (eScience).

[7]  Guillaume Aupy,et al.  Modeling high-throughput applications for in situ analytics , 2019, Int. J. High Perform. Comput. Appl..

[8]  Dong H. Ahn,et al.  Flux: Overcoming Scheduling Challenges for Exascale Workflows , 2018, 2018 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS).

[9]  Yves Robert,et al.  Co-scheduling HPC workloads on cache-partitioned CMP platforms , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).

[10]  E. Deelman,et al.  Enabling Data Analytics Workflows using Node-Local Storage , 2018 .

[11]  Antonello Monti,et al.  Dynamic Co-Scheduling Driven by Main Memory Bandwidth Utilization , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).

[12]  Franck Cappello,et al.  Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales , 2017, 2017 IEEE 24th International Conference on High Performance Computing (HiPC).

[13]  Adam Liwo,et al.  In situ data analytics and indexing of protein trajectories , 2017, J. Comput. Chem..

[14]  Michael E. Papka,et al.  Optimal Execution of Co-analysis for Large-Scale Molecular Dynamics Simulations , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Hal Finkel,et al.  Large-scale compute-intensive analysis via a combined in-situ and co-scheduling workflow approach , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[16]  Sudhakar Yalamanchili,et al.  Understanding Energy Aspects of Processing-near-Memory for HPC Workloads , 2015, MEMSYS.

[17]  Klaus Schulten,et al.  Multiple-Replica Strategies for Free-Energy Calculations in NAMD: Multiple-Walker Adaptive Biasing Force and Walker Selection Rules. , 2014, Journal of chemical theory and computation.

[18]  Henri Casanova,et al.  Versatile, scalable, and accurate simulation of distributed applications and platforms , 2014, J. Parallel Distributed Comput..

[19]  Gregory A. Koenig,et al.  Modeling the Effects on Power and Performance from Memory Interference of Co-located Applications in Multicore Systems , 2014 .

[20]  Jarek Nabrzyski,et al.  Cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[21]  Maurice Herlihy,et al.  The Art of Multiprocessor Programming, Revised Reprint , 2012 .

[22]  Riccardo Chelli,et al.  Serial Generalized Ensemble Simulations of Biomolecules with Self-Consistent Determination of Weights. , 2012, Journal of chemical theory and computation.

[23]  Henri Casanova,et al.  Resource allocation algorithms for virtualized service hosting platforms , 2010, J. Parallel Distributed Comput..

[24]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[25]  A. Laio,et al.  Efficient reconstruction of complex free energy landscapes by multiple walkers metadynamics. , 2006, The journal of physical chemistry. B.

[26]  Yuko Okamoto,et al.  Generalized-ensemble algorithms: enhanced sampling techniques for Monte Carlo and molecular dynamics simulations. , 2003, Journal of molecular graphics & modelling.

[27]  Dror G. Feitelson,et al.  Job Scheduling in Multiprogrammed Parallel Systems , 1997 .

[28]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).