Climate Science Performance, Data and Productivity on Titan

Climate Science models are flagship codes for the largest of high performance computing (HPC) resources, both in visibility, with the newly launched Department of Energy (DOE) Accelerated Climate Model for Energy (ACME) effort, and in terms of significant fractions of system usage. The performance of the DOE ACME model is captured with application level timers and examined through a sizeable run archive. Performance and variability of compute, queue time and ancillary services are examined. As Climate Science advances in the use of HPC resources there has been an increase in the required human and data systems to achieve programs goals. A description of current workflow processes (hardware, software, human) and planned automation of the workflow, along with historical and projected data in motion and at rest data usage, are detailed. The combination of these two topics motivates a description of future systems requirements for DOE Climate Modeling efforts, focusing on the growth of data storage and network and disk bandwidth required to handle data at an acceptable rate.

[1]  O. E. Bronson Messer,et al.  Near Real-time Data Analysis of Core-collapse Supernova Simulations with Bellerophon , 2014, ICCS.

[2]  Alexandru Iosup,et al.  The Grid Workloads Archive , 2008, Future Gener. Comput. Syst..

[3]  Zhiling Lan,et al.  Job scheduling with adjusted runtime estimates on production supercomputers , 2013, J. Parallel Distributed Comput..

[4]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[5]  Interner Bericht VAMPIR: Visualization and Analysis of MPI Resources , 1996 .

[6]  Michèle Sebag,et al.  The Grid Observatory , 2011, CCGRID.

[7]  Ewa Deelman,et al.  Workflow overhead analysis and optimizations , 2011, WORKS '11.

[8]  Miron Livny,et al.  Online Task Resource Consumption Prediction for Scientific Workflows , 2015, Parallel Process. Lett..

[9]  Tristan Glatard,et al.  A Science-Gateway Workload Archive to Study Pilot Jobs, User Activity, Bag of Tasks, Task Sub-steps, and Workflow Executions , 2012, Euro-Par Workshops.

[10]  Douglas Thain,et al.  Toward fine-grained online task characteristics estimation in scientific workflows , 2013, WORKS@SC.

[11]  Ying Wang,et al.  Enabling Data and Compute Intensive Workflows in Bioinformatics , 2011, Euro-Par Workshops.

[12]  David L. Hart Measuring TeraGrid: workload characterization for a high-performance computing federation , 2011, Int. J. High Perform. Comput. Appl..

[13]  Dean N. Williams,et al.  The Earth System Grid Federation: An open infrastructure for access to distributed geospatial data , 2012, 2012 IEEE 8th International Conference on E-Science.

[14]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..