Globus Automation Services: Research process automation across the space-time continuum

Research process automation—the reliable, e ffi cient, and reproducible execution of linked sets of actions on scientific instruments, computers, data stores, and other resources—has emerged as an essential element of modern science. We report here on new services within the Globus research data management platform that enable the specification of diverse research processes as reusable sets of actions, flows , and the execution of such flows in heterogeneous research environments. To support flows with broad spatial extent (e.g., from scientific instrument to remote data center) and temporal extent (from seconds to weeks), these Globus automation services feature: 1) cloud hosting for reliable execution of even long-lived flows despite sporadic failures; 2) a simple specification and extensible asynchronous action provider API, for defining and executing a wide variety of actions and flows involving heterogeneous resources; 3) an event-driven execution model for automating execution of flows in response to arbitrary events; and 4) a rich security model enabling authorization delegation mechanisms for secure execution of long-running actions across distributed resources. These services permit researchers to outsource and automate the management of a broad range of research tasks to a reliable, scalable, and secure cloud platform. We present use cases for Globus automation services, describe their design and implementation, present microbenchmark studies, and review experiences applying the services in a range of applications.

[1]  Ian T. Foster,et al.  Linking scientific instruments and computation: Patterns, technologies, and experiences , 2022, Patterns.

[2]  Tyler J. Skluzacek,et al.  funcX: Federated Function as a Service for Science , 2022, IEEE Trans. Parallel Distributed Syst..

[3]  John D. McCalpin,et al.  Intelligent resolution: Integrating Cryo-EM with AI-driven multi-resolution simulations to observe the severe acute respiratory syndrome coronavirus-2 replication-transcription machinery in action , 2022, Int. J. High Perform. Comput. Appl..

[4]  K. Hippalgaonkar,et al.  An Object-Oriented Framework to Enable Workflow Evolution Across Materials Acceleration Platforms , 2022, SSRN Electronic Journal.

[5]  Taylor L. Groves,et al.  The LBNL Superfacility Project Report , 2022, ArXiv.

[6]  Ian T. Foster,et al.  fairDMS: Rapid Model Training by Data and Model Reuse , 2022, 2022 IEEE International Conference on Cluster Computing (CLUSTER).

[7]  Ian T. Foster,et al.  Fixed-target serial crystallography at the Structural Biology Center , 2022, bioRxiv.

[8]  Katerina B. Antypas,et al.  Enabling discovery data science through cross-facility workflows , 2021, 2021 IEEE International Conference on Big Data (Big Data).

[9]  Rajkumar Kettimuthu,et al.  High-Performance Ptychographic Reconstruction with Federated Facilities , 2021, SMC.

[10]  Daniel J. B. Clarke,et al.  Making Common Fund data more findable: catalyzing a data ecosystem , 2021, bioRxiv.

[11]  Rosa M. Badia,et al.  A Community Roadmap for Scientific Workflows Research and Development , 2021, 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS).

[12]  Maksim Levental,et al.  Ultrafast Focus Detection for Automated Microscopy , 2021, 2021 IEEE 17th International Conference on eScience (eScience).

[13]  Aaron Stein,et al.  Gaussian processes for autonomous data acquisition at large-scale synchrotron and neutron facilities , 2021, Nature Reviews Physics.

[14]  Semion K. Saikin,et al.  Autonomous experimentation systems for materials development: A community perspective , 2021 .

[15]  G. Grübel,et al.  From Femtoseconds to Hours—Measuring Dynamics over 18 Orders of Magnitude with Coherent X-rays , 2021, Applied Sciences.

[16]  Nicholas K Sauter,et al.  Real-Time XFEL Data Analysis at SLAC and NERSC: a Trial Run of Nascent Exascale Experimental Data Analysis , 2021, ArXiv.

[17]  Ian T. Foster,et al.  Bridging Data Center AI Systems with Edge Computing for Actionable Information Retrieval , 2021, 2021 3rd Annual Workshop on Extreme-scale Experiment-in-the-Loop Computing (XLOOP).

[18]  Ian Foster,et al.  Towards Accommodating Real-time Jobs on HPC Platforms , 2021, ArXiv.

[19]  Paolo Manghi,et al.  A workflow language for research e-infrastructures , 2021, International Journal of Data Science and Analytics.

[20]  S. Lammel,et al.  Dynamic Distribution of High-Rate Data Processing from CERN to Remote HPC Data Centers , 2021, Comput. Softw. Big Sci..

[21]  Wen-mei W. Hwu,et al.  MemXCT: Design, Optimization, Scaling, and Reproducibility of X-ray Tomography Imaging , 2021, IEEE Transactions on Parallel and Distributed Systems.

[22]  C. Snavely,et al.  Automation for Data-Driven Research with the NERSC Superfacility API , 2021, Lecture Notes in Computer Science.

[23]  Smruti Padhy,et al.  Tapis: An API Platform for Reproducible, Distributed Computational Research , 2021 .

[24]  Steven Tuecke,et al.  DLHub: Simplifying publication, discovery, and use of machine learning models in science , 2021, J. Parallel Distributed Comput..

[25]  Ilkay Altintas,et al.  Using Dynamic Data Driven Cyberinfrastructure for Next Generation Disaster Intelligence , 2020, DDDAS.

[26]  Carl Kesselman,et al.  An Open Ecosystem for Pervasive Use of Persistent Identifiers , 2020, PEARC.

[27]  S. Tuecke,et al.  OAuth SSH with Globus Auth , 2020, PEARC.

[28]  Reiner Sebastian Sprick,et al.  A mobile robotic chemist , 2020, Nature.

[29]  Ian Foster,et al.  funcX: A Federated Function Serving Fabric for Science , 2020, HPDC.

[30]  David T. Jones,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[31]  Ian T. Foster,et al.  TomoGAN: low-dose synchrotron x-ray tomography with generative adversarial networks: discussion. , 2019, Journal of the Optical Society of America. A, Optics, image science, and vision.

[32]  Mallikarjun Shankar,et al.  DataFed: Towards Reproducible Research via Federated Data Management , 2019, 2019 International Conference on Computational Science and Computational Intelligence (CSCI).

[33]  Venkatram Vishwanath,et al.  Balsam: Near Real-Time Experimental Data Analysis on Supercomputers , 2019, 2019 IEEE/ACM 1st Annual Workshop on Large-scale Experiment-in-the-Loop Computing (XLOOP).

[34]  Hongyu Shen,et al.  Enabling real-time multi-messenger astrophysics discoveries with deep learning , 2019, Nature Reviews Physics.

[35]  Austin Wright,et al.  JSON Schema: A Media Type for Describing JSON Documents , 2019 .

[36]  Ian T. Foster,et al.  Petrel: A Programmatically Accessible Research Data Service , 2019, PEARC.

[37]  Ian Foster,et al.  Parsl: Pervasive Parallel Programming in Python , 2019, HPDC.

[38]  Kyle Chard,et al.  A data ecosystem to support machine learning in materials science , 2019, MRS Communications.

[39]  Leroy Cronin,et al.  Organic synthesis in a modular robotic system driven by a chemical programming language , 2019, Science.

[40]  Ian T. Foster,et al.  DLHub: Model and Data Serving for Science , 2018, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[41]  Ian T. Foster,et al.  Globus Platform Services for Data Publication , 2018, PEARC.

[42]  Alán Aspuru-Guzik,et al.  ChemOS: Orchestrating autonomous experimentation , 2018, Science Robotics.

[43]  Ian T. Foster,et al.  High-Throughput Neuroanatomy and Trigger-Action Programming: A Case Study in Research Automation , 2018, AI-Science@HPDC.

[44]  Gwyndaf Evans,et al.  DIALS: implementation and evaluation of a new integration package , 2018, Acta crystallographica. Section D, Structural biology.

[45]  R. Pokharel Overview of High-Energy X-Ray Diffraction Microscopy (HEDM) for Mesoscale Material Characterization in Three-Dimensions , 2018 .

[46]  Aditya G. Parameswaran,et al.  How Developers Iterate on Machine Learning Workflows , 2018 .

[47]  Jie Gong,et al.  Online Decision-Making Using Edge Resources for Content-Driven Stream Processing , 2017, 2017 IEEE 13th International Conference on e-Science (e-Science).

[48]  Meitian Wang,et al.  Serial Synchrotron X-Ray Crystallography (SSX). , 2017, Methods in molecular biology.

[49]  Chandra Krintz,et al.  Where's the Bear? - Automating Wildlife Image Processing Using IoT and Edge Cloud Systems , 2017, 2017 IEEE/ACM Second International Conference on Internet-of-Things Design and Implementation (IoTDI).

[50]  Jano I. van Hemert,et al.  Scientific Workflows , 2016, ACM Comput. Surv..

[51]  Reagan Moore,et al.  iRODS Primer 2: Integrated Rule-Oriented Data System , 2017, iRODS Primer 2.

[52]  Nour Ali,et al.  A Systematic Mapping Study in Microservice Architecture , 2016, 2016 IEEE 9th International Conference on Service-Oriented Computing and Applications (SOCA).

[53]  Ian T. Foster,et al.  Globus auth: A research identity and access management platform , 2016, 2016 IEEE 12th International Conference on e-Science (e-Science).

[54]  S. Tuecke,et al.  The Materials Data Facility: Data Services to Advance Materials Science Research , 2016 .

[55]  Jiyun Lee,et al.  Trigger-Action Programming in the Wild: An Analysis of 200,000 IFTTT Recipes , 2016, CHI.

[56]  Ian T. Foster,et al.  Globus Nexus: A Platform-as-a-Service provider of research identity, profile, and group management , 2016, Future Gener. Comput. Syst..

[57]  Foster Ian,et al.  Globus auth: A research identity and access management platform , 2016 .

[58]  Miron Livny,et al.  Pegasus, a workflow management system for science automation , 2015, Future Gener. Comput. Syst..

[59]  Geoffrey C. Fox,et al.  A Framework for Real Time Processing of Sensor Data in the Cloud , 2015, J. Sensors.

[60]  Nicholas K Sauter,et al.  Enabling X-ray free electron laser crystallography for challenging biological systems from a limited number of crystals , 2015, eLife.

[61]  Ian T. Foster,et al.  Globus platform‐as‐a‐service for collaborative science applications , 2015, Concurr. Comput. Pract. Exp..

[62]  Ian T. Foster,et al.  Efficient and Secure Transfer, Synchronization, and Sharing of Big Data , 2014, IEEE Cloud Computing.

[63]  Blase Ur,et al.  Practical trigger-action programming in the smart home , 2014, CHI.

[64]  Govinda R. Poudel,et al.  The multi-modal Australian ScienceS Imaging and Visualization Environment (MASSIVE) high performance computing infrastructure: applications in neuroscience and neuroinformatics research , 2014, Front. Neuroinform..

[65]  Douglas Thain,et al.  Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids , 2012, SWEET '12.

[66]  J. Dubochet,et al.  Cryo‐EM—the first thirty years , 2012, Journal of microscopy.

[67]  Ian T. Foster,et al.  Software as a service for data scientists , 2012, Commun. ACM.

[68]  Daniel S. Katz,et al.  Swift: A language for distributed parallel scripting , 2011, Parallel Comput..

[69]  M. Miller,et al.  Far-field high-energy diffraction microscopy: a tool for intergranular orientation and strain analysis , 2011 .

[70]  Fucai Zhang,et al.  Superresolution imaging via ptychography. , 2011, Journal of the Optical Society of America. A, Optics, image science, and vision.

[71]  Shreyas Cholia,et al.  NEWT: A RESTful service for building High Performance Computing web applications , 2010, 2010 Gateway Computing Environments Workshop (GCE).

[72]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[73]  Carole A. Goble,et al.  A comparison of using Taverna and BPEL in building scientific workflows: the case of caGrid , 2010, Concurr. Comput. Pract. Exp..

[74]  Maria Liakata,et al.  Towards Robot Scientists for autonomous scientific discovery , 2010, Automated experimentation.

[75]  Ian J. Taylor,et al.  Publish/subscribe as a model for scientific workflow interoperability , 2009, WORKS '09.

[76]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[77]  Gary Marchionini,et al.  Synthesis Lectures on Information Concepts, Retrieval, and Services , 2009 .

[78]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[79]  Yaron Goland,et al.  Web Services Business Process Execution Language , 2009, Encyclopedia of Database Systems.

[80]  Nancy Wilkins-Diehr,et al.  TeraGrid Science Gateways and Their Impact on Science , 2008, Computer.

[81]  Liana L. Fong,et al.  BPEL4Job: A Fault-Handling Design for Job Flow Management , 2007, ICSOC.

[82]  Sara J. Graves,et al.  CASA and LEAD: adaptive cyberinfrastructure for real-time multiscale weather forecasting , 2006, Computer.

[83]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[84]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[85]  Ivan Beschastnikh,et al.  SPRUCE: A System for Supporting Urgent High-Performance Computing , 2006, Grid-Based Problem Solving Environments.

[86]  Liang Chen,et al.  Grid Service Orchestration Using the Business Process Execution Language (BPEL) , 2005, Journal of Grid Computing.

[87]  I. Foster,et al.  Service-Oriented Science , 2005, Science.

[88]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[89]  Anne-Marie Kermarrec,et al.  The many faces of publish/subscribe , 2003, CSUR.

[90]  Sanjiva Weerawarana,et al.  Unraveling the Web services web: an introduction to SOAP, WSDL, and UDDI , 2002, IEEE Internet Computing.

[91]  Rajkumar Buyya,et al.  A taxonomy and survey of grid resource management systems for distributed computing , 2002, Softw. Pract. Exp..

[92]  The International Journal of High Performance Computing Applications— , 1998 .