Empowering Agroecosystem Modeling with HTC Scientific Workflows: The Cycles Model Use Case

Scientific workflows have enabled large-scale scientific computations and data analysis, and lowered the entry barrier for performing computations in distributed heterogeneous platforms (e.g., HTC and HPC). In spite of impressive achievements to date, large-scale modeling, simulation, and data analytics in the long-tail still face several challenges such as efficient scheduling and execution of large-scale workflows $(\mathrm{O}(10^{6}))$ with very short-running tasks (few seconds). While the current trend to support next-generation workflows on leadership class machines have gained much attention in the past years, at the other end of the spectrum scientific workflows from the long-tail science have become larger and require processing massive volumes of data. In this paper, we report on our experience in designing and implementing an HTC workflow for agroecosystem modeling. We leverage well-known (task clustering and co-scheduling) and emerging (hierarchical workflows and containers) workflow optimization techniques to make the workflow planning problem tractable, and maximize resource utilization and the degree of task parallelism. Experimental results, via the implementation of a use case, show that by strategically combining the above strategies and defining an appropriate set of optimization parameters, the overall workflow makespan can be improved by 3.5 orders of magnitude when compared to a regular (non-optimized) execution of the workflow.

[1]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[2]  Armen Ricardo Kemanian,et al.  Six crop models differ in their simulation of water uptake , 2016 .

[3]  Ewa Deelman,et al.  Automating environmental computing applications with scientific workflows , 2016, 2016 IEEE 12th International Conference on e-Science (e-Science).

[4]  Terry A. Howell,et al.  A Canopy Transpiration and Photosynthesis Model for Evaluating Simple Crop Productivity Models , 2015 .

[5]  Johan Montagnat,et al.  Scientific workflows: Past, present and future , 2017, Future Gener. Comput. Syst..

[6]  Michael Zink,et al.  Custom Execution Environments with Containers in Pegasus-Enabled Scientific Workflows , 2019, 2019 15th International Conference on eScience (eScience).

[7]  Claudio O. Stöckle,et al.  C-Farm: a simple model to evaluate the carbon balance of soil profiles. , 2010 .

[8]  Devarshi Ghoshal,et al.  Performance Characterization of Scientific Workflows for the Optimal Use of Burst Buffers , 2017, WORKS@SC.

[9]  Ewa Deelman,et al.  Dynamic and Fault-Tolerant Clustering for Scientific Workflows , 2016, IEEE Transactions on Cloud Computing.

[10]  Marta Mattoso,et al.  A Survey of Data-Intensive Scientific Workflow Management , 2015, Journal of Grid Computing.

[11]  Vipin Kumar,et al.  MINT: Model INTegration Through Knowledge-Powered Data and Process Composition , 2018 .

[12]  Gaylon S. Campbell,et al.  Soil physics with BASIC :transport models for soil-plant systems , 1985 .

[13]  Ozik Jonathan,et al.  From desktop to Large-Scale Model Exploration with Swift/T , 2016 .

[14]  Armen Ricardo Kemanian,et al.  Implications of carbon saturation model structures for simulated nitrogen mineralization dynamics , 2014 .

[15]  Ian T. Foster,et al.  Resource co-allocation in computational grids , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[16]  Qingbo Wu,et al.  Workflow scheduling in cloud: a survey , 2015, The Journal of Supercomputing.

[17]  Miron Livny,et al.  Pegasus, a workflow management system for science automation , 2015, Future Gener. Comput. Syst..

[18]  Daniel Crawl,et al.  Firemap: A Dynamic Data-Driven Predictive Wildfire Modeling and Visualization Environment , 2017, ICCS.

[19]  L. S. Pereira,et al.  A recommendation on standardized surface resistance for hourly calculation of reference ETo by the FAO56 Penman-Monteith method , 2006 .

[20]  Henri Casanova,et al.  WRENCH: A Framework for Simulating Workflow Management Systems , 2018, 2018 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS).

[21]  Christopher J. Duffy,et al.  Automating data-model workflows at a level 12 HUC scale: Watershed modeling in a distributed computing environment , 2014, Environ. Model. Softw..

[22]  Ewa Deelman,et al.  Measuring the impact of burst buffers on data-intensive scientific workflows , 2019, Future Gener. Comput. Syst..

[23]  Yolanda Gil,et al.  Towards Model Integration via Abductive Workflow Composition and Multi-Method Scalable Model Execution , 2018 .

[24]  Christopher J. Duffy,et al.  Visualization workflows for level-12 HUC scales: Towards an expert system for watershed analysis in a distributed computing environment , 2016, Environ. Model. Softw..

[25]  Miron Livny,et al.  The Evolution of the Pegasus Workflow Management Software , 2019, Computing in Science & Engineering.

[26]  Claudio O. Stöckle,et al.  Simulation of water uptake in maize, using different levels of process detail , 1999 .

[27]  Claudio O. Stöckle,et al.  CropSyst model evolution: From field to regional to global scales and from research to decision support systems , 2014, Environ. Model. Softw..

[28]  Douglas Thain,et al.  Integrating Containers into Workflows: A Case Study Using Makeflow, Work Queue, and Docker , 2015, VTDC@HPDC.

[29]  Marta Mattoso,et al.  Dynamic steering of HPC scientific workflows: A survey , 2015, Future Gener. Comput. Syst..

[30]  Rizos Sakellariou,et al.  A characterization of workflow management systems for extreme-scale applications , 2016, Future Gener. Comput. Syst..

[31]  Claudio O. Stöckle,et al.  A simple method to estimate harvest index in grain crops , 2007 .

[32]  Ewa Deelman,et al.  Community Resources for Enabling Research in Distributed Scientific Workflows , 2014, 2014 IEEE 10th International Conference on e-Science.

[33]  Jano I. van Hemert,et al.  Scientific Workflows , 2016, ACM Comput. Surv..

[34]  Fan Zhang,et al.  Enabling In-situ Execution of Coupled Scientific Workflow on Multi-core Platform , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[35]  Ewa Deelman,et al.  Workflow overhead analysis and optimizations , 2011, WORKS '11.