The Evolution of Bits and Bottlenecks in a Scientific Workflow Trying to Keep Up with Technology: Accelerating 4D Image Segmentation Applied to NASA Data

In 2016, a team of earth scientists directly engaged a team of computer scientists to identify cyberinfrastructure (CI) approaches that would speed up an earth science workflow. This paper describes the evolution of that workflow as the two teams bridged CI and an image segmentation algorithm to do large scale earth science research. The Pacific Research Platform (PRP) and The Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI) resources were used to significantly decreased the earth science workflow's wall-clock time from 19.5 days to 53 minutes. The improvement in wall-clock time comes from the use of network appliances, improved image segmentation, deployment of a containerized workflow, and the increase in CI experience and training for the earth scientists. This paper presents a description of the evolving innovations used to improve the workflow, bottlenecks identified within each workflow version, and improvements made within each version of the workflow, over a three-year time period.

[1]  S. L. Sellars,et al.  “Grand Challenges” in Big Data and the Earth Sciences , 2018, Bulletin of the American Meteorological Society.

[2]  Soroosh Sorooshian,et al.  An Object-Oriented Approach to Investigate Impacts of Climate Oscillations on Precipitation: A Western United States Case Study , 2015 .

[3]  Kuolin Hsu,et al.  Computational Earth Science: Big Data Transformed Into Insight , 2013 .

[4]  Daniel Walton,et al.  Atmospheric River Tracking Method Intercomparison Project (ARTMIP): project goals and experimental design , 2018, Geoscientific Model Development.

[5]  Matthew S. Mayernik,et al.  Build It, But Will They Come? A Geoscience Cyberinfrastructure Baseline Analysis , 2016, Data Sci. J..

[6]  S.V. Burtsev,et al.  An efficient flood-filling algorithm , 1993, Comput. Graph..

[7]  S. Schubert,et al.  MERRA: NASA’s Modern-Era Retrospective Analysis for Research and Applications , 2011 .

[8]  Ben Domenico,et al.  Thematic Real-time Environmental Distributed Data Services (THREDDS): Incorporating Interactive Analysis Tools into NSDL , 2002, J. Digit. Inf..

[9]  Thomas Blaschke,et al.  Object-oriented image analysis and scale-space: Theory and methods for modeling and evaluating multi-scale landscape structure , 2001 .

[10]  Kuolin Hsu,et al.  Exploring Trends through “RainSphere”: Research data transformed into public knowledge , 2017 .

[11]  Larry Lindsey,et al.  High-precision automated reconstruction of neurons with flood-filling networks , 2017, Nature Methods.

[12]  Soroosh Sorooshian,et al.  Genesis, Pathways, and Terminations of Intense Global Water Vapor Transport in Association with Large-Scale Climate Patterns , 2017 .

[13]  Xiao Zhao,et al.  The connected-component labeling problem: A review of state-of-the-art algorithms , 2017, Pattern Recognit..