The Globus Galaxies platform: delivering science gateways as a service

The use of public cloud computers to host sophisticated scientific data and software is transforming scientific practice by enabling broad access to capabilities previously available only to the few. The primary obstacle to more widespread use of public clouds to host scientific software (‘cloud‐based science gateways’) has thus far been the considerable gap between the specialized needs of science applications and the capabilities provided by cloud infrastructures. We describe here a domain‐independent, cloud‐based science gateway platform, the Globus Galaxies platform, which overcomes this gap by providing a set of hosted services that directly address the needs of science gateway developers. The design and implementation of this platform leverages our several years of experience with Globus Genomics, a cloud‐based science gateway that has served more than 200 genomics researchers across 30 institutions. Building on that foundation, we have implemented a platform that leverages the popular Galaxy system for application hosting and workflow execution; Globus services for data transfer, user and group management, and authentication; and a cost‐aware elastic provisioning model specialized for public cloud resources. We describe here the capabilities and architecture of this platform, present six scientific domains in which we have successfully applied it, report on user experiences, and analyze the economics of our deployments. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.

[1]  Michael I. Miller,et al.  The CardioVascular Research Grid ( CVRG ) Project , 2012 .

[2]  Alex Rodriguez,et al.  Experiences building Globus Genomics: a next‐generation sequencing analysis service using Galaxy, Globus, and Amazon Web Services , 2014, Concurr. Comput. Pract. Exp..

[3]  Michael McLennan,et al.  HUBzero: A Platform for Dissemination and Collaboration in Computational Science and Engineering , 2010, Computing in Science & Engineering.

[4]  Ian T. Foster,et al.  Software as a service for data scientists , 2012, Commun. ACM.

[5]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[6]  Ian T. Foster,et al.  Globus Nexus: An identity, profile, and group management platform for science gateways and other collaborative science applications , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[7]  Zhao Zhang,et al.  Parallel Scripting for Applications at the Petascale and Beyond , 2009, Computer.

[8]  Lutz Gross,et al.  High-Performance Scientific Computing for the Masses: Developing Secure Grid Portals for Scientific Workflows , 2010, 2010 IEEE Sixth International Conference on e-Science.

[9]  日経BP社,et al.  Amazon Web Services完全ソリューションガイド , 2016 .

[10]  Warren Smith,et al.  A Resource Management Architecture for Metacomputing Systems , 1998, JSSPP.

[11]  Alex Rodriguez,et al.  PDACS - A Portal for Data Analysis Services for Cosmological Simulations , 2014, 2014 9th Gateway Computing Environments Workshop.

[12]  Shreyas Cholia,et al.  NEWT: A RESTful service for building High Performance Computing web applications , 2010, 2010 Gateway Computing Environments Workshop (GCE).

[13]  Ian T. Foster,et al.  A Cloud-Based Image Analysis Gateway for Traumatic Brain Injury Research , 2014, 2014 9th Gateway Computing Environments Workshop.

[14]  I. Foster,et al.  Service-Oriented Science , 2005, Science.

[15]  Konstantinos Krampis,et al.  Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community , 2012, BMC Bioinformatics.

[16]  Borja Sotomayor,et al.  Deploying Bioinformatics Workflows on Clouds with Galaxy and Globus Provision , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[17]  Srinath Perera,et al.  Apache airavata: a framework for distributed applications and computational workflows , 2011, GCE '11.

[18]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[19]  Joel H. Saltz,et al.  Temporal Abstraction-based Clinical Phenotyping with Eureka! , 2013, AMIA.

[20]  Ian T. Foster,et al.  Globus Online: Accelerating and Democratizing Science through Cloud-Based Services , 2011, IEEE Internet Computing.

[21]  Rion Dooley,et al.  Software-as-a-Service: The iPlant Foundation API , 2012 .

[22]  Ian T. Foster,et al.  Experiences in building a next-generation sequencing analysis service using galaxy, globus online and Amazon web service , 2013, XSEDE.

[23]  Ming Mao,et al.  A Performance Study on the VM Startup Time in the Cloud , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[24]  Moni Naor,et al.  Job Scheduling Strategies for Parallel Processing , 2017, Lecture Notes in Computer Science.

[25]  Anton Nekrutenko,et al.  Harnessing cloud computing with Galaxy Cloud , 2011, Nature Biotechnology.

[26]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[27]  Alex Rodriguez,et al.  Consensus Genotyper for Exome Sequencing (CGES): improving the quality of exome variant genotypes , 2015, Bioinform..

[28]  Wei Xiong,et al.  FACE‐IT: A science gateway for food security research , 2015, Concurr. Comput. Pract. Exp..