Introducing the Open Science Chain: Protecting Integrity and Provenance of Research Data

Data sharing is an integral component of research and academic publications, allowing for independent verification of results. Researchers have the ability to extend and build upon prior research when they are able to efficiently access, validate, and verify the data referenced in publications. Despite the well known benefits of making research data more open, data withholding rates have remained constant. Some disincentives to sharing research data include lack of credit, and fear of misrepresentation of data in the absence of context and provenance. While there are several research data sharing repositories that focus on making research data available, there are no cyberinfrastructure platforms that enable researchers to efficiently validate the authenticity of datasets, track the provenance, view the lineage of the data and verify ownership information. In this paper, we introduce and provide an overview of the NSF funded Open Science Chain, a cyberinfrastructure platform built using blockchain technologies that securely stores metadata and verification information about research data and tracks changes to that data in an auditable manner in order to address issues related to reproducibility and accountability in scientific research.

[1]  John P. A. Ioannidis,et al.  Research: increasing value, reducing waste 2 , 2014 .

[2]  Ben Goldacre,et al.  Are clinical trial data shared sufficiently today? No , 2013, BMJ.

[3]  Marko Vukolic,et al.  Hyperledger fabric: a distributed operating system for permissioned blockchains , 2018, EuroSys.

[4]  Jim Basney,et al.  CILogon: a federated X.509 certification authority for cyberinfrastructure logon , 2013, XSEDE.

[5]  Trey Ideker,et al.  Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..

[6]  R. Tibshirani,et al.  Increasing value and reducing waste in research design, conduct, and analysis , 2014, The Lancet.

[7]  T. Hubbard,et al.  Developing and implementing an institute-wide data sharing policy , 2011, Genome Medicine.

[8]  Andrew Jarvis,et al.  Hole-filled SRTM for the globe Version 4 , 2008 .

[9]  Lei Bao,et al.  AbsCN-seq: a statistical method to estimate tumor purity, ploidy and absolute copy numbers from next-generation sequencing data , 2014, Bioinform..

[10]  C. Ball,et al.  Repeatability of published microarray gene expression analyses , 2009, Nature Genetics.

[11]  Eric G. Campbell,et al.  The Changing Nature of Scientific Sharing and Withholding in Academic Life Sciences Research: Trends From National Surveys in 2000 and 2013 , 2016, Academic medicine : journal of the Association of American Medical Colleges.

[12]  Howard Davies A crisis of trust , 2009 .

[13]  J. Ioannidis,et al.  Why Current Publication Practices May Distort Science , 2008, PLoS medicine.

[14]  John Castellani Are clinical trial data shared sufficiently today? Yes , 2013, BMJ.

[15]  Christian Collberg,et al.  Measuring Reproducibility in Computer Systems Research , 2014 .

[16]  Christopher Lane,et al.  Drug Companies & Doctors: A Story of Corruption , 2009 .

[17]  I. Cockburn,et al.  The Economics of Reproducibility in Preclinical Research , 2015, PLoS biology.

[18]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[19]  John P. A. Ioannidis,et al.  How to Make More Published Research True , 2014, PLoS medicine.