Aether: Leveraging Linear Programming for Optimal Cloud Computing In Genomics

Across biology we are seeing rapid developments in scale of data production without a corresponding increase in data analysis capabilities. Here, we present Aether (http://aether.kosticlab.org), an intuitive, easy-to-use, cost-effective, and scalable framework that uses linear programming (LP) to optimally bid on and deploy combinations of underutilized cloud computing resources. Our approach simultaneously minimizes the cost of data analysis while maximizing its efficiency and speed. As a test, we used Aether to de novo assemble 1572 metagenomic samples, a task it completed in merely 13 hours with cost savings of approximately 80% relative to comparable methods.

[1]  Liang Zheng,et al.  How to Bid the Cloud , 2015, Comput. Commun. Rev..

[2]  Karthikeyan Sankaralingam,et al.  Power challenges may end the multicore era , 2013, CACM.

[3]  Eric S. Lander,et al.  Natural history of the infant gut microbiome and impact of antibiotic treatment on bacterial strain diversity and stability , 2015, Science Translational Medicine.

[4]  Tommi Vatanen,et al.  Variation in Microbiome LPS Immunogenicity Contributes to Autoimmunity in Humans , 2016, Cell.

[5]  Artur Andrzejak,et al.  Decision Model for Cloud Computing under SLA Constraints , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[6]  Kunihiko Sadakane,et al.  MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph , 2014, Bioinform..

[7]  Douglas Stott Parker,et al.  Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.

[8]  Johan Tordsson,et al.  Cloud brokering mechanisms for optimized placement of virtual machines across multiple providers , 2012, Future Gener. Comput. Syst..

[9]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[10]  V. Tremaroli,et al.  Dynamics and Stabilization of the Human Gut Microbiome during the First Year of Life. , 2015, Cell host & microbe.

[11]  Tommi Vatanen,et al.  The dynamics of the human infant gut microbiome in development and in progression toward type 1 diabetes. , 2015, Cell host & microbe.

[12]  Brett K. Beaulieu-Jones,et al.  Reproducibility of computational workflows is automated using continuous analysis , 2017, Nature Biotechnology.

[13]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[14]  Rajkumar Buyya,et al.  Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers , 2011, J. Parallel Distributed Comput..

[15]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[16]  Duy Tin Truong,et al.  MetaPhlAn2 for enhanced metagenomic taxonomic profiling , 2015, Nature Methods.