Refinery: a data management, analysis, and visualization platform utilizing the Galaxy workbench

Having a reproducible approach to managing data, running analysis workflows, and visualizing data is a common challenge across the bioinformatics community. These three tasks are typically isolated from one another which makes comprehensive tracking of data lineage, or data provenance, particularly difficult. Here we provide an update on the Refinery Platform, which addresses these chal lenges wi th a centralized data repository and tools to explore, analyze and visualize data. Our platform enables users to: 1) annotate and store all their data in a private, searchable database, 2) share data sets with other members via simple-to-use collaboration features, 3) run Galaxy Workflows on their data in a scalable fashion, and 4) launch interactive Docker-based visualizations on their raw data and derived results. We have re c e n t l y i m p ro v e d c a p a b i l i t i e s f o r performing genomic analyses by leveraging Galaxy’s Dataset Collections, and by exposing Galaxy Workflow parameters through our Tool Launching interface. There has also been a large effort within the visualization tool space that allows users to launch interactive visualizations on their data f ro m w i t h i n o u r p l at f o r m , a n d f o r prospective tool developers to wrap arbitrary web-based visualization tools in Docker containers with minimal overhead. Our vision is that the Refinery Platform will be a central resource for a community of researchers keen on tracking their data provenance and conducting reproducible science.