Public sharing of research datasets: A pilot study of associations

The public sharing of primary research datasets potentially benefits the research community but is not yet common practice. In this pilot study, we analyzed whether data sharing frequency was associated with funder and publisher requirements, journal impact factor, or investigator experience and impact. Across 397 recent biomedical microarray studies, we found investigators were more likely to publicly share their raw dataset when their study was published in a high-impact journal and when the first or last authors had high levels of career experience and impact. We estimate the USA's National Institutes of Health (NIH) data sharing policy applied to 19% of the studies in our cohort; being subject to the NIH data sharing plan requirement was not found to correlate with increased data sharing behavior in multivariate logistic regression analysis. Studies published in journals that required a database submission accession number as a condition of publication were more likely to share their data, but this trend was not statistically significant. These early results will inform our ongoing larger analysis, and hopefully contribute to the development of more effective data sharing initiatives.

[1]  Margaret L. Hedstrom Producing Archive-Ready Datasets: Compliance, Incentives, and Motivation , 2006, IASSIST Conference.

[2]  Gary King A Revised Proposal, Proposal , 1995 .

[3]  Christian J Stoeckert,et al.  Much room for improvement in deposition rates of expression microarray datasets , 2008, Nature Methods.

[4]  Ingoo Han,et al.  Knowledge sharing behavior of physicians in hospitals , 2003, Expert Syst. Appl..

[5]  S. Hilgartner,et al.  Data withholding in academic genetics: evidence from a national survey. , 2002, JAMA.

[6]  N. Stanietsky,et al.  The interaction of TIGIT with PVR and PVRL2 inhibits human NK cell cytotoxicity , 2009, Proceedings of the National Academy of Sciences.

[7]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[8]  S. Eddy,et al.  Sharing Publication-Related Data and Materials: Responsibilities of Authorship in the Life Sciences1 , 2003, Plant Physiology.

[9]  Jason Barringer,et al.  Time for leadership , 2007, Nature Biotechnology.

[10]  D. Altman,et al.  Towards agreement on best practice for publishing raw clinical trial data , 2009, Trials.

[11]  P. Donnelly,et al.  New models of collaboration in genome-wide association studies: the Genetic Association Information Network , 2007, Nature Genetics.

[12]  Helen E. Parkinson,et al.  ArrayExpress—a public database of microarray experiments and gene expression profiles , 2006, Nucleic Acids Res..

[13]  John P A Ioannidis,et al.  Selective reporting biases in cancer prognostic factor studies. , 2005, Journal of the National Cancer Institute.

[14]  Emilia Branny,et al.  Automatic summary evaluation based on text grammars , 2007, J. Digit. Inf..

[15]  Microarray standards at last , 2002, Nature.

[16]  Beverly Ventura Mandatory submission of microarray data to public repositories: how is it working? , 2005, Physiological genomics.

[17]  Stephen Hilgartner,et al.  Data Withholding in Genetics and the Other Life Sciences: Prevalences and Predictors , 2006, Academic medicine : journal of the Association of American Medical Colleges.

[18]  Birgit Renzl,et al.  Personality traits and knowledge sharing , 2008 .

[19]  Neil R. Smalheiser,et al.  A probabilistic similarity metric for Medline records: A model for author name disambiguation , 2005, J. Assoc. Inf. Sci. Technol..

[20]  P. Allotey,et al.  Data sharing in medical research: an empirical investigation. , 2001, Bioethics.

[21]  Wendy W. Chapman,et al.  A review of journal policies for sharing research data , 2008, ELPUB.

[22]  Joe Shelby Cecilt,et al.  An Early Warning and Suggestions for Psychologists , 1988 .

[23]  Jürgen Bitzer,et al.  Intrinsic motivation in open source software development , 2007 .

[24]  M. Noor,et al.  Data Sharing: How Much Doesn't Get Submitted to GenBank? , 2006, PLoS biology.

[25]  Terrance Kennedy Mills Time for leadership , 2008 .

[26]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[27]  Teresa D. Harrison,et al.  Do Economics Journal Archives Promote Replicable Research? , 2006 .

[28]  Lee Sproull,et al.  What's Mine Is Ours, or Is It? A Study of Attitudes about Information Sharing , 1994, Inf. Syst. Res..

[29]  Katherine W. McCain,et al.  Mandating Sharing , 1995 .

[30]  Data's shameful neglect. , 2009, Nature.

[31]  P. Brown,et al.  Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Yang Tao,et al.  A Study on Development Planning for Management Science and Engineering , 2006 .

[33]  J. M. Hancock,et al.  Post-publication sharing of data and tools , 2009, Nature.

[34]  C. Ball,et al.  Submission of Microarray Data to Public Repositories , 2004, PLoS biology.

[35]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[36]  Sandra H. Berry,et al.  Gender Differences in Major Federal External Grant Programs , 2005 .

[37]  Lowrance Wm Access to Collections of Data and Material for Health Research. A report to the Medical Research Council and the Wellcome Trust , 2006 .

[38]  Lutz Bornmann,et al.  Are there better indices for evaluation purposes than the h index? A comparison of nine different variants of the h index using data from biomedicine , 2008, J. Assoc. Inf. Sci. Technol..

[39]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[40]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .

[41]  C. Street,et al.  The Cancer Biomedical Informatics Grid (caBIGTM) , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[42]  Stefanie E Warlick,et al.  Factors influencing publication choice: why faculty choose open access , 2007, Biomedical Digital Libraries.

[43]  Paul Dourish,et al.  The human infrastructure of cyberinfrastructure , 2006, CSCW '06.

[44]  Feng-Yang Kuo,et al.  A study of the intention-action gap in knowledge sharing practices , 2008, J. Assoc. Inf. Sci. Technol..

[45]  Kim Seonghee,et al.  An analysis of faculty perceptions: Attitudes toward knowledge sharing and collaboration in an academic institution , 2008 .

[46]  Kerry K Kakazu,et al.  The Cancer Biomedical Informatics Grid (caBIG): pioneering an expansive network of information and tools for collaborative cancer research. , 2004, Hawaii medical journal.

[47]  Neil R. Smalheiser,et al.  Author name disambiguation in MEDLINE , 2009, TKDD.

[48]  Richard Giordano,et al.  The Scientist: Secretive, Selfish or Reticent? A Social Network Analysis , 2006 .

[49]  Cecelia M. Brown The changing face of scientific discourse: Analysis of genomic and proteomic database usage and acceptance , 2003, J. Assoc. Inf. Sci. Technol..

[50]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[51]  Melissa S. Anderson,et al.  Withholding research results in academic life science. Evidence from a national survey of faculty. , 1997, JAMA.

[52]  Jihyun Kim,et al.  Motivating and Impeding Factors Affecting Faculty Contribution to Institutional Repositories , 2007, J. Digit. Inf..

[53]  Lutz Bornmann,et al.  Are there better indices for evaluation purposes than the h index? A comparison of nine different variants of the h index using data from biomedicine , 2008, J. Assoc. Inf. Sci. Technol..