Experiences using smaash to manage data-intensive simulations

High performance scientific computer simulations created with such systems as the University of Chicago's FLASH code generate enormous amounts of data that must be captured, cataloged, and analyzed. Unless this is formally done, monitoring such simulations, tracking and reproducing old ones, and analyzing and archiving their output, can be haphazard and idiosyncratic. Smaash, a simulation management and analysis system that has been developed at the University of Chicago and Argonne National Laboratory, seeks to solve some of these problems by offering what approaches a single point of control and analysis, a metadata-base, and a set of tools that automate some of what scientists have been doing by hand. Smaash was designed to be independent of the particular simulation code, and is accessible from many computing platforms. It is automatic and standardized, and was built using open source software tools. Data security is considered throughout the process, yet users are insulated from onerous verification procedures. Because the system was developed with feedback from scientific users, its user interface reflects how scientists work in their daily life. We describe our system and a typical simulation it was designed to support. We illustrate its utility with several examples describing our experience of freeing scientists from the data manipulation phase to focus on the computational results and the analysis of high performance computing.

[1]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[2]  D. Lamb,et al.  STUDY OF THE DETONATION PHASE IN THE GRAVITATIONALLY CONFINED DETONATION MODEL OF TYPE Ia SUPERNOVAE , 2008, 0806.4972.

[3]  Brad Gallagher,et al.  Scientific Applications on the Massively Parallel BG/L Machine , 2006, PDPTA.

[4]  Peter Higgins,et al.  Getting StartED with Dojo , 2009 .

[5]  Reagan Moore,et al.  Data-intensive computing and digital libraries , 1998, CACM.

[6]  Michael E. Papka,et al.  Enabling community access to TeraGrid visualization resources , 2007, Concurr. Comput. Pract. Exp..

[7]  Wesley J. Chun,et al.  Python Web Development with Django , 2008 .

[8]  Ann L. Chervenak,et al.  Data Management Challenges of Data-Intensive Scientific Workflows , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[9]  Scott Klasky,et al.  Collaborative visualization spaces for petascale simulations , 2008, 2008 International Symposium on Collaborative Technologies and Systems.

[10]  Gregor von Laszewski,et al.  Swift: Fast, Reliable, Loosely Coupled Parallel Computation , 2007, 2007 IEEE Congress on Services (Services 2007).

[11]  Lynn B. Reid,et al.  Challenges of Extreme Computing using the FLASH code , 2008 .

[12]  B. Fryxell,et al.  FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes , 2000 .

[13]  Ian T. Foster,et al.  Accelerating Medical Research using the Swift Workflow System , 2007, HealthGrid.

[14]  Andrew Siegel,et al.  Extensible component-based architecture for FLASH, a massively parallel, multiphysics simulation code , 2009, Parallel Comput..

[15]  Robert E. Schuler,et al.  Building a Global Federation System for Climate Change Research: The Earth System Grid Center for Enabling Technologies (ESG-CET) , 2007 .

[16]  Brad Gallagher,et al.  Terascale turbulence computation using the FLASH3 application framework on the IBM Blue Gene/L system , 2008, IBM J. Res. Dev..

[17]  Jee-In Kim,et al.  A Web-Based Interactive Monitoring System for Molecular Simulation , 2007, 2007 Frontiers in the Convergence of Bioscience and Information Technologies.

[18]  Scott Klasky,et al.  Scientific Process Automation and Workflow Management , 2009, Scientific Data Management.

[19]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[20]  Cecilia R. Aragon Collaborative Analytics for Astrophysics Explorations , 2009, Handbook of Automation.

[21]  Nancy Wilkins-Diehr,et al.  TeraGrid Science Gateways and Their Impact on Science , 2008, Computer.

[22]  Jian Huang,et al.  Web enabled collaborative climate visualization in the Earth System Grid , 2008, 2008 International Symposium on Collaborative Technologies and Systems.

[23]  D. Lamb,et al.  Three-Dimensional Simulations of the Deflagration Phase of the Gravitationally Confined Detonation Model of Type Ia Supernovae , 2007 .

[24]  Scott Klasky,et al.  Enabling Advanced Visualization Tools in a Web-Based Simulation Monitoring System , 2009, 2009 Fifth IEEE International Conference on e-Science.

[25]  Wei-keng Liao,et al.  A case study for scientific I/O: improving the FLASH astrophysics code , 2012 .