It’s all in the timing: calibrating temporal penalties for biomedical data sharing

Objective Biomedical science is driven by datasets that are being accumulated at an unprecedented rate, with ever-growing volume and richness. There are various initiatives to make these datasets more widely available to recipients who sign Data Use Certificate agreements, whereby penalties are levied for violations. A particularly popular penalty is the temporary revocation, often for several months, of the recipient's data usage rights. This policy is based on the assumption that the value of biomedical research data depreciates significantly over time; however, no studies have been performed to substantiate this belief. This study investigates whether this assumption holds true and the data science policy implications. Methods This study tests the hypothesis that the value of data for scientific investigators, in terms of the impact of the publications based on the data, decreases over time. The hypothesis is tested formally through a mixed linear effects model using approximately 1200 publications between 2007 and 2013 that used datasets from the Database of Genotypes and Phenotypes, a data-sharing initiative of the National Institutes of Health. Results The analysis shows that the impact factors for publications based on Database of Genotypes and Phenotypes datasets depreciate in a statistically significant manner. However, we further discover that the depreciation rate is slow, only ∼10% per year, on average. Conclusion The enduring value of data for subsequent studies implies that revoking usage for short periods of time may not sufficiently deter those who would violate Data Use Certificate agreements and that alternative penalty mechanisms may need to be invoked.

[1]  K. Greiner,et al.  Moving forward: breaking the cycle of mistrust between American Indians and researchers. , 2013, American journal of public health.

[2]  Heather A. Piwowar,et al.  Sharing Detailed Research Data Is Associated with Increased Citation Rate , 2007, PloS one.

[3]  Erika Check Hayden,et al.  Technology: The $1,000 genome , 2014, Nature.

[4]  Christine L. Borgman,et al.  The conundrum of sharing research data , 2012, J. Assoc. Inf. Sci. Technol..

[5]  K. Sirotkin,et al.  The NCBI dbGaP database of genotypes and phenotypes , 2007, Nature Genetics.

[6]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[7]  Julia Adler-Milstein,et al.  Electronic Health Record Adoption In US Hospitals: Progress Continues, But Challenges Persist. , 2015, Health affairs.

[8]  Hilary S. Leeds,et al.  Data use under the NIH GWAS Data Sharing Policy and future directions , 2014, Nature Genetics.

[9]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[10]  Michelle Dunn,et al.  The National Institutes of Health's Big Data to Knowledge (BD2K) initiative: capitalizing on biomedical big data , 2014, J. Am. Medical Informatics Assoc..

[11]  G. Eysenbach,et al.  Social Media: A Review and Tutorial of Applications in Medicine and Health Care , 2014, Journal of medical Internet research.

[12]  Deborah Estrin,et al.  Center of Excellence for Mobile Sensor Data-to-Knowledge (MD2K) , 2015, IEEE Pervasive Computing.

[13]  H. Piwowar,et al.  Data archiving is a good investment , 2011, Nature.

[14]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[15]  T. Murdoch,et al.  The inevitable application of big data to health care. , 2013, JAMA.

[16]  Carl T. Bergstrom,et al.  The Eigenfactor MetricsTM: A Network Approach to Assessing Scholarly Journals , 2010, Coll. Res. Libr..

[17]  E. Garfield The history and meaning of the journal impact factor. , 2006, JAMA.

[18]  Christine L. Borgman,et al.  Research Data: Who Will Share What, with Whom, When, and Why? , 2010 .

[19]  Eric D Green,et al.  The Complexities of Genomic Identifiability , 2013, Science.

[20]  Ted Bergstrom Papers The Eigenfactor Metrics: A network approach to assessing scholarly journals , 2010 .