The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update

Abstract Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially.

[1]  Daniel J. Blankenberg,et al.  Biology Needs Evolutionary Software Tools: Let’s Build Them Right , 2018, Molecular biology and evolution.

[2]  Olaf Wolkenhauer,et al.  The RNA workbench: best practices for RNA and high-throughput sequencing bioinformatics in Galaxy , 2017, Nucleic Acids Res..

[3]  Vanessa Sochat,et al.  Singularity: Scientific containers for mobility of compute , 2017, PloS one.

[4]  Rolf Backofen,et al.  Jupyter and Galaxy: Easing entry barriers into complex data analyses for biomedical researchers , 2017, PLoS Comput. Biol..

[5]  Harald Barsnes,et al.  BioContainers: an open-source and community-driven framework for software standardization , 2017, Bioinform..

[6]  Kin Chung Lam,et al.  High-resolution TADs reveal DNA sequences underlying genome organization in flies , 2017, Nature Communications.

[7]  Geet Duggal,et al.  Salmon: fast and bias-aware quantification of transcript expression using dual-phase inference , 2017, Nature Methods.

[8]  Xun Zhu,et al.  Single cell transcriptomics reveals unanticipated features of early hematopoietic precursors , 2016, Nucleic acids research.

[9]  John Chilton,et al.  Common Workflow Language, v1.0 , 2016 .

[10]  Andrew Lonie,et al.  CloudBridge: a Simple Cross-Cloud Python Library , 2016, XSEDE.

[11]  Måns Magnusson,et al.  MultiQC: summarize analysis results for multiple tools and samples in a single report , 2016, Bioinform..

[12]  John Chilton,et al.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update , 2016, Nucleic Acids Res..

[13]  Fidel Ramírez,et al.  deepTools2: a next generation web server for deep-sequencing data analysis , 2016, Nucleic Acids Res..

[14]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[15]  James T. Robinson,et al.  Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace , 2015, Nature Methods.

[16]  Michael R. Crusoe,et al.  Common Workflow Language , 2015 .

[17]  Ian T. Foster,et al.  Jetstream: a self-provisioned, scalable science and engineering cloud environment , 2015, XSEDE.

[18]  Alexander S. Rose,et al.  NGL Viewer: a web application for molecular visualization , 2015, Nucleic Acids Res..

[19]  Anton Nekrutenko,et al.  Online resources for genomic analysis using high-throughput sequencing. , 2015, Cold Spring Harbor protocols.

[20]  James E. Johnson,et al.  NCBI BLAST+ integrated into Galaxy , 2015, bioRxiv.

[21]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[22]  Nancy Wilkins-Diehr,et al.  XSEDE: Accelerating Scientific Discovery , 2014, Computing in Science & Engineering.

[23]  Aaron R. Quinlan,et al.  Poretools: a toolkit for analyzing nanopore sequence data , 2014, bioRxiv.

[24]  Fidel Ramírez,et al.  deepTools: a flexible platform for exploring deep-sequencing data , 2014, Nucleic Acids Res..

[25]  Stefan Günther,et al.  ChemicalToolBoX and its application on the study of the drug like and purchasable space , 2014, Journal of Cheminformatics.

[26]  Anton Nekrutenko,et al.  Wrangling Galaxy’s reference data , 2014, Bioinform..

[27]  Daniel J. Blankenberg,et al.  Dissemination of scientific software with Galaxy ToolShed , 2014, Genome Biology.

[28]  Anton Nekrutenko,et al.  Web-based visual analysis for high-throughput genomics , 2013, BMC Genomics.

[29]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[30]  Aaron R. Quinlan,et al.  GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations , 2013, PLoS Comput. Biol..

[31]  Maria Jesus Martin,et al.  BioJS: an open source JavaScript framework for biological data visualization , 2013, Bioinform..

[32]  Predrag Buncic,et al.  Status and future perspectives of CernVM-FS , 2012 .

[33]  Brent S. Pedersen,et al.  BioStar: An Online Question & Answer Resource for the Bioinformatics Community , 2011, PLoS Comput. Biol..

[34]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[35]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[36]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[37]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[38]  Daniel J. Blankenberg,et al.  A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly. , 2007, Genome research.

[39]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[40]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[41]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[42]  A. Yoo,et al.  SLURM: Simple Linux Utility for Resource Management , 2003, JSSPP.

[43]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..