Towards a scientific blockchain framework for reproducible data analysis

Publishing reproducible analyses is a long-standing and widespread challenge for the scientific community, funding bodies and publishers. Although a definitive solution is still elusive, the problem is recognized to affect all disciplines and lead to a critical system inefficiency. Here, we propose a blockchain-based approach to enhance scientific reproducibility, with a focus on life science studies and precision medicine. While the interest of encoding permanently into an immutable ledger all the study key information-including endpoints, data and metadata, protocols, analytical methods and all findings-has been already highlighted, here we apply the blockchain approach to solve the issue of rewarding time and expertise of scientists that commit to verify reproducibility. Our mechanism builds a trustless ecosystem of researchers, funding bodies and publishers cooperating to guarantee digital and permanent access to information and reproducible results. As a natural byproduct, a procedure to quantify scientists' and institutions' reputation for ranking purposes is obtained.

[1]  G. Folly,et al.  Some methodological problems in ranking scientists by citation analysis , 1981, Scientometrics.

[2]  A. Barabasi,et al.  Quantifying the evolution of individual scientific impact , 2016, Science.

[3]  Eric J Topol,et al.  Money back guarantees for non-reproducible results? , 2016, British Medical Journal.

[4]  Nigel W. Hardy,et al.  Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project , 2008, Nature Biotechnology.

[5]  Neil Malhotra,et al.  Publication bias in the social sciences: Unlocking the file drawer , 2014, Science.

[6]  I. Cockburn,et al.  The Economics of Reproducibility in Preclinical Research , 2015, PLoS biology.

[7]  M. Baker 1,500 scientists lift the lid on reproducibility , 2016, Nature.

[8]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[9]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[10]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[11]  Alexandre Arenas,et al.  Evaluating the impact of interdisciplinary research: A multilayer network approach , 2016, Network Science.

[12]  Johan Bollen,et al.  A Principal Component Analysis of 39 Scientific Impact Measures , 2009, PloS one.

[13]  Cesare Furlanello,et al.  Entropy-based gene ranking without selection bias for the predictive classification of microarray data , 2003, BMC Bioinformatics.

[14]  C. Nickerson A note on a concordance correlation coefficient to evaluate reproducibility , 1997 .

[15]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  L. Lin,et al.  A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.

[17]  Jeroen F. J. Laros,et al.  Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories , 2013, Nature Biotechnology.

[18]  Santo Fortunato,et al.  Diffusion of scientific credits and the ranking of scientists , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Myriam P. Sarachik,et al.  Plastic Fantastic: How the Biggest Fraud in Physics Shook the Scientific World , 2009 .

[20]  Albert-László Barabási,et al.  Quantifying Long-Term Scientific Impact , 2013, Science.

[21]  F. Collins,et al.  NIH plans to enhance reproducibility , 2014 .

[22]  John P. A. Ioannidis,et al.  A manifesto for reproducible science , 2017, Nature Human Behaviour.

[23]  John P. A. Ioannidis,et al.  What does research reproducibility mean? , 2016, Science Translational Medicine.

[24]  Jeffrey T. Leek,et al.  Opinion: Reproducible research can still be wrong: Adopting a prevention approach , 2015, Proceedings of the National Academy of Sciences.

[25]  Jalalian Mehrdad The story of fake impact factor companies and how we detected them. , 2015 .

[26]  Claudio Castellano,et al.  Universality of citation distributions: Toward an objective measure of scientific impact , 2008, Proceedings of the National Academy of Sciences.

[27]  F. Collins,et al.  Policy: NIH plans to enhance reproducibility , 2014, Nature.

[28]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[29]  M. Rosenblatt,et al.  An incentive-based approach for improving data reproducibility , 2016, Science Translational Medicine.

[30]  C. Ball,et al.  Repeatability of published microarray gene expression analyses , 2009, Nature Genetics.

[31]  Elizabeth Gilbert,et al.  Reproducibility Project: Results (Part of symposium called "The Reproducibility Project: Estimating the Reproducibility of Psychological Science") , 2014 .

[32]  Mehrdad Jalalian,et al.  The story of fake impact factor companies and how we detected them , 2015, Electronic physician.

[33]  Timothy D. Wilson,et al.  Comment on “Estimating the reproducibility of psychological science” , 2016, Science.

[34]  A. V. van Raan,et al.  Fatal attraction: Conceptual and methodological problems in the ranking of universities by bibliometric methods , 2005 .

[35]  F. Prinz,et al.  Believe it or not: how much can we rely on published data on potential drug targets? , 2011, Nature Reviews Drug Discovery.