MAKER as a Service: Moving HPC Applications to Jetstream Cloud

As cloud resources become more available as an execution platform, the need to transition applications between HPC and the cloud becomes a necessity. However, because of the complex setup and system specific demands of these applications, transition is difficult and may not scale as desired. Jetstream is a NSF funded cloud service that is aiming to provide these services for users in a dynamical allocated nature. In this work we look at three key areas to focus on when transitioning between resources: providing a portable reproducible environment, scaling between local and remote resources, and using feedback to the user for informing configuration and runtime decisions. Building on the MAKER bioinformatic application, we have deployed WQ-MAKER on the Jetstream cloud platform, helping to annotate over 30 genomes and accelerating performance from days to hours and weeks to days.

[1]  Daniel S. Katz,et al.  Swift: A language for distributed parallel scripting , 2011, Parallel Comput..

[2]  M. Yandell,et al.  Genome Annotation and Curation Using MAKER and MAKER‐P , 2014, Current protocols in bioinformatics.

[3]  Nancy Wilkins-Diehr,et al.  XSEDE: Accelerating Scientific Discovery , 2014, Computing in Science & Engineering.

[4]  Douglas Thain,et al.  Scaling up genome annotation using MAKER and work queue , 2014, Int. J. Bioinform. Res. Appl..

[5]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[6]  Vanessa Sochat,et al.  Singularity: Scientific containers for mobility of compute , 2017, PloS one.

[7]  Ian T. Foster,et al.  Jetstream: a self-provisioned, scalable science and engineering cloud environment , 2015, XSEDE.

[8]  Andres Löh,et al.  NixOS: a purely functional Linux distribution , 2008, ICFP.

[9]  Gregory M. Kurtzer,et al.  Singularity 2.1.2 - Linux application and environment containers for science , 2016 .

[10]  Reid Priedhorsky,et al.  Charliecloud: Unprivileged Containers for User-Defined Software Stacks in HPC , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Douglas Thain,et al.  A Framework for Scalable Genome Assembly on Clusters, Clouds, and Grids , 2012, IEEE Transactions on Parallel and Distributed Systems.

[12]  Douglas Thain,et al.  Work Queue + Python: A Framework For Scalable Scientific Ensemble Applications , 2011 .

[13]  Douglas Thain,et al.  Automatic Dependency Management for Scientific Applications on Clusters , 2018, 2018 IEEE International Conference on Cloud Engineering (IC2E).

[14]  Bronis R. de Supinski,et al.  The Spack package manager: bringing order to HPC software chaos , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Yolanda Gil,et al.  Pegasus: Planning for Execution in Grids , 2002 .

[16]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .