Improving Publication and Reproducibility of Computational Experiments through Workflow Abstractions

The current practice of publishing articles solely containing textual descriptions of methods is error prone and incomplete. Even when a reproducible workflow or notebook is linked to an article, the text of the article is not well integrated with those computational components, and the workflow and notebook are focused mostly on implementation details that are disconnected from the scientific approach described in the text of the article. Through an analysis of three multi-omics articles, we illustrate why this makes it difficult to understand, reproduce, compare, and reuse computational methods. We propose workflow abstractions that that capture different concepts and perspectives that are important to scientists. These abstractions connect the text of an article to the corresponding workflow, and provide a framework to improve the publication and reproducibility of computational experiments.

[1]  Yolanda Gil,et al.  Enhancing reproducibility for computational methods , 2016, Science.

[2]  Darrel C. Ince,et al.  The case for open computer programs , 2012, Nature.

[3]  J. Uhm Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2009 .

[4]  Arian Maleki,et al.  Reproducible Research in Computational Harmonic Analysis , 2009, Computing in Science & Engineering.

[5]  Yolanda Gil,et al.  Towards Automating Data Narratives , 2017, IUI.

[6]  Richard Van Noorden Sluggish data sharing hampers reproducibility effort , 2015 .

[7]  Yolanda Gil,et al.  A semantic framework for automatic generation of computational workflows using distributed data and component catalogues , 2011, J. Exp. Theor. Artif. Intell..

[8]  Yolanda Gil,et al.  Abstract, link, publish, exploit: An end to end framework for workflow sharing , 2017, Future Gener. Comput. Syst..

[9]  Carole A. Goble,et al.  Common motifs in scientific workflows: An empirical analysis , 2012, 2012 IEEE 8th International Conference on E-Science.

[10]  K. Coombes,et al.  Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology , 2009, 1010.1092.

[11]  Jeffrey R. Whiteaker,et al.  Proteogenomic characterization of human colon and rectal cancer , 2014, Nature.

[12]  Paul T. Groth,et al.  Analyzing the Gap between Workflows and their Natural Language Descriptions , 2009, 2009 Congress on Services - I.

[13]  Paul M. Thompson,et al.  FragFlow Automated Fragment Detection in Scientific Workflows , 2014, 2014 IEEE 10th International Conference on e-Science.

[14]  Yolanda Gil,et al.  Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome , 2013, PloS one.

[15]  C. Ball,et al.  Repeatability of published microarray gene expression analyses , 2009, Nature Genetics.

[16]  Paul M. Thompson,et al.  Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users , 2014, 2014 IEEE 10th International Conference on e-Science.

[17]  Angela N. Brooks,et al.  Mapping the Hallmarks of Lung Adenocarcinoma with Massively Parallel Sequencing , 2012, Cell.

[18]  Yolanda Gil,et al.  Human Tutorial Instruction in the Raw , 2015, ACM Trans. Interact. Intell. Syst..

[19]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .