Toil enables reproducible, open source, big biomedical data analyses

1. Baker, M. Nature 533, 452–454 (2016). 2. Yachie, N. et al. Nat. Biotechnol. 35, 310–312 (2017). 3. Hadimioglu, B., Stearns, R. & Ellson, R. J. Lab. Autom. 21, 4–18 (2016). 4. ANSI SLAS 1–2004: Footprint dimensions; ANSI SLAS 2–2004: Height dimensions; ANSI SLAS 3–2004: Bottom outside flange dimensions; ANSI SLAS 4–2004: Well positions; (ANSI SLAS, 2004). 5. Mckernan, K. & Gustafson, E. in DNA Sequencing II: Optimizing Preparation and Cleanup (ed. Kieleczawa, J.) 9.128 (Jones and Bartlett Publishers, 2006). 6. Storch, M. et al. BASIC: a new biopart assembly standard for idempotent cloning provides accurate, singletier DNA assembly for synthetic biology. ACS Synth. Biol. 4, 781–787 (2015). open sharing of protocols. With a precise ontology to describe standardized protocols, it may be possible to share methods widely and create community standards. We envisage that in future individual research laboratories, or clusters of colocated laboratories, will have in-house, low-cost automation work cells but will access DNA foundries via the cloud to carry out complex experimental workflows. Technologies enabling this from companies such as Emerald Cloud Lab (S. San Francisco, CA, USA), Synthace (London) and Transcriptic (Menlo Park, CA, USA) could, for example, send experimental designs to foundries and return output data to a researcher. This ‘mixed economy’ should accelerate the development and sharing of standardized protocols and metrology standards and shift a growing proportion of molecular, cellular and synthetic biology into a fully quantitative and reproducible era.

[1]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[2]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[3]  Syed Haider,et al.  International Cancer Genome Consortium Data Portal—a one-stop shop for cancer genomics data , 2011, Database J. Biol. Databases Curation.

[4]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[5]  John Chilton,et al.  Common Workflow Language, v1.0 , 2016 .

[6]  Tom Ellis,et al.  BASIC: A New Biopart Assembly Standard for Idempotent Cloning Provides Accurate, Single-Tier DNA Assembly for Synthetic Biology. , 2015, ACS synthetic biology.

[7]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[8]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[9]  Douglas Thain,et al.  Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids , 2012, SWEET '12.

[10]  Wolfgang Gentzsch,et al.  Sun Grid Engine: towards creating a compute power grid , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[11]  Dorian Gorgan,et al.  Workflow Description Language for defining Big Earth Data processing tasks , 2015, 2015 IEEE International Conference on Intelligent Computer Communication and Processing (ICCP).

[12]  David A. Patterson,et al.  ADAM: Genomics Formats and Processing Patterns for Cloud Scale Computing , 2013 .

[13]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[14]  D. Berry,et al.  I‐SPY 2: An Adaptive Breast Cancer Trial Design in the Setting of Neoadjuvant Chemotherapy , 2009, Clinical pharmacology and therapeutics.

[15]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[16]  N. Siva UK gears up to decode 100 000 genomes from NHS patients , 2015, The Lancet.

[17]  Jacek Sroka,et al.  Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies , 2012, SIGMOD 2013.

[18]  Dmitri D. Pervouchine,et al.  The human transcriptome across tissues and individuals , 2015, Science.