Science automation in practice: Performance data farming in workflows

This paper describes an approach to conduct large-scale parameter studies, where each data point in the study requires the execution of a whole scientific workflow. We show how a parameter studies system can be integrated with a workflow management system to seamlessly execute a large number of workflows, each with different input parameter values using large-scale computing infrastructure. The work is motivated by a need to collect performance-related data to conduct a sensitivity analysis in the context of relation between workflow input parameters and the performance of tasks in the workflow developed for the Spallation Neutron Source facility at the Oak Ridge National Laboratory.

[1]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[2]  Alexandru Iosup,et al.  Grid Computing Workloads , 2011, IEEE Internet Computing.

[3]  Jacek Kitowski,et al.  Model-Based Approach To Study Hot Rolling Mills With Data Farming , 2016, ECMS.

[4]  C. Kesselman,et al.  CyberShake: A Physics-Based Seismic Hazard Model for Southern California , 2011 .

[5]  Carlos García Garino,et al.  LOGOS: Enabling Local Resource Managers for the Efficient Support of Data-Intensive Workflows within Grid Sites , 2014, Comput. Informatics.

[6]  Jacek Kitowski,et al.  Self-scalable services in service oriented software for cost-effective data farming , 2016, Future Gener. Comput. Syst..

[7]  Ákos Balaskó,et al.  On a workflow model based on generalized communicating P systems , 2016, Comput. Sci..

[8]  Johan Montagnat,et al.  Flexible and Efficient Workflow Deployment of Data-Intensive Applications On Grids With MOTEUR , 2008, Int. J. High Perform. Comput. Appl..

[9]  Miron Livny,et al.  Distributed computing in practice: the Condor experience: Research Articles , 2005 .

[10]  Miron Livny,et al.  Pegasus, a workflow management system for science automation , 2015, Future Gener. Comput. Syst..

[11]  Fariborz Taghipour,et al.  Computational fluid dynamics of high density circulating fluidized bed riser : Study of modeling parameters , 2008 .

[12]  Johan Montagnat,et al.  Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform , 2011 .

[13]  Carl Kesselman,et al.  GriPhyN and LIGO, building a virtual data Grid for gravitational wave scientists , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[14]  Jarek Nabrzyski,et al.  Cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  K. Herwig,et al.  A new apparatus design for high temperature (up to 950°C) quasi-elastic neutron scattering in a controlled gaseous environment. , 2015, The Review of scientific instruments.

[16]  M. Lamanna The LHC computing grid project at CERN , 2004 .

[17]  Jacek Kitowski,et al.  A Cloud-Based Data Farming Platform for Molecular Dynamics Simulations , 2014, 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing.

[18]  Jacek Kitowski,et al.  Distributed Computing Instrastructure as a Tool for e-Science , 2015, PPAM.