The conundrum of sharing research data

We must all accept that science is data and that data are science, and thus provide for, and justify the need for the support of, much-improved data curation. (Hanson, Sugden, & Alberts) Researchers are producing an unprecedented deluge of data by using new methods and instrumentation. Others may wish to mine these data for new discoveries and innovations. However, research data are not readily available as sharing is common in only a few fields such as astronomy and genomics. Data sharing practices in other fields vary widely. Moreover, research data take many forms, are handled in many ways, using many approaches, and often are difficult to interpret once removed from their initial context. Data sharing is thus a conundrum. Four rationales for sharing data are examined, drawing examples from the sciences, social sciences, and humanities: (1) to reproduce or to verify research, (2) to make results of publicly funded research available to the public, (3) to enable others to ask new questions of extant data, and (4) to advance the state of research and innovation. These rationales differ by the arguments for sharing, by beneficiaries, and by the motivations and incentives of the many stakeholders involved. The challenges are to understand which data might be shared, by whom, with whom, under what conditions, why, and to what effects. Answers will inform data policy and practice. © 2012 Wiley Periodicals, Inc.

[1]  R. Merton Behavior Patterns Of Scientists , 1970, American scientist.

[2]  A. Lyon Dealing with data , 1970 .

[3]  R. Merton The Normative Structure of Science , 1973 .

[4]  H. M. Collins,et al.  The Seven Sexes: A Study in the Sociology of a Phenomenon, or the Replication of Experiments in Physics , 1975 .

[5]  B. Latour,et al.  Laboratory Life: The Construction of Scientific Facts , 1979 .

[6]  B. Latour,et al.  Laboratory Life: The Social Construction of Scientific Facts , 1983 .

[7]  H. M. Collins,et al.  THE SOCIOLOGY OF SCIENTIFIC KNOWLEDGE: STUDIES OF CONTEMPORARY SCIENCE , 1983 .

[8]  S. Fienberg,et al.  Sharing research data , 1985 .

[9]  B. Latour Science in action : how to follow scientists and engineers through society , 1989 .

[10]  Michael K. Buckland,et al.  Information as Thing , 1991 .

[11]  Etienne Wenger,et al.  Situated Learning: Legitimate Peripheral Participation , 1991 .

[12]  Morrison Ji A natural experiment. , 1992 .

[13]  S. Hilgartner,et al.  Data Access, Ownership, and Control , 1994 .

[14]  L. Nelson Data, data everywhere. , 1997, Critical care medicine.

[15]  H. Collins The Meaning of Data: Open and Closed Evidential Cultures in the Search for Gravitational Waves1 , 1998, American Journal of Sociology.

[16]  Merriam-Webster Merriam-Webster's Collegiate Dictionary , 1998 .

[17]  Etienne Wenger,et al.  Communities of Practice: Learning, Meaning, and Identity , 1998 .

[18]  E. Wenger Communities of Practice: Learning, Meaning, and Identity , 1998 .

[19]  K. Knorr-Cetina,et al.  Epistemic cultures : how the sciences make knowledge , 1999 .

[20]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[21]  Bertram C. Bruce,et al.  Modeling Distributed Knowledge Processes in Next Generation Multidisciplinary Alliances* , 2000, Proceedings Academia/Industry Working Conference on Research Challenges '00. Next Generation Enterprises: Virtual Organizations and Mobile/Pervasive Technologies. AIWORC'00. (Cat. No.PR00628).

[22]  Sanjeev Khanna,et al.  Data Provenance: Some Basic Issues , 2000, FSTTCS.

[23]  Bertram C. Bruce,et al.  Modeling Distributed Knowledge Processes in Next Generation Multidisciplinary Alliances* , 2000, Inf. Syst. Frontiers.

[24]  J. Emonds Summary of principles , 2001 .

[25]  G. Brumfiel Misconduct finding at Bell Labs shakes physics community , 2002, Nature.

[26]  S. Hilgartner,et al.  Data withholding in academic genetics: evidence from a national survey. , 2002, JAMA.

[27]  S. Hilgartner Acceptable intellectual property. , 2002, Journal of molecular biology.

[28]  T. Data,et al.  The Genius of Intellectual Property and the Need for the Public Domain , 2003 .

[29]  T. Data,et al.  Scientific Knowledge as a Global Public Good: Contributions to Innovation and the Economy , 2003 .

[30]  P. Uhlir,et al.  A Contractually Reconstructed Research Commons for Scientific Data in a Highly Protectionist Intellectual Property Environment , 2003 .

[31]  Paul F. Uhlir,et al.  The Role of Scientific and Technical Data and Information in the Public Domain , 2003 .

[32]  Anne E. Trefethen,et al.  The Data Deluge: An e-Science Perspective , 2003 .

[33]  Nancy A. Van House,et al.  Science and technology studies and information studies , 2005, Annu. Rev. Inf. Sci. Technol..

[34]  C. Gobler,et al.  Nutrient limitation, organic matter cycling, and plankton dynamics during an Aureococcus anophagefferens bloom , 2004 .

[35]  Henry S. Rzepa,et al.  The Next Big Thing: From Hypermedia to Datuments , 2004, J. Digit. Inf..

[36]  Julie Esanu,et al.  Open Access and the Public Domain in Digital Data and Information for Science: Proceedings of an International Symposium , 2004 .

[37]  Carole L. Palmer,et al.  Scholarly work and the shaping of digital access: Research Articles , 2005 .

[38]  David J. DeWitt,et al.  Scientific data management in the coming decade , 2005, SGMD.

[39]  Carole L. Palmer,et al.  Scholarly work and the shaping of digital access , 2005, J. Assoc. Inf. Sci. Technol..

[40]  Philip E. Bourne,et al.  Will a Biological Database Be Different from a Biological Journal? , 2005, PLoS Comput. Biol..

[41]  Sarita Albagli,et al.  Memory Practices in the Sciences , 2008 .

[42]  Geoffrey C. Bowker,et al.  Comparative interoperability project: configurations of community, technology, organization , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[43]  Carsten S. Østerlund,et al.  Relations in Practice: Sorting Through Practice Theories on Knowledge Sharing in Complex Organizations , 2005, Inf. Soc..

[44]  Declan Butler,et al.  Mashups mix data into global service , 2006, Nature.

[45]  J. Unsworth Our Cultural Commonwealth: The report of the American Council of Learned Societies Commission on Cyberinfrastructure for the Humanities and Social Sciences , 2006 .

[46]  Ladislav Chodil Open Content Alliance , 2006 .

[47]  D. Normile,et al.  South Korean Team's Remaining Human Stem Cell Claim Demolished , 2006, Science.

[48]  J. Couzin,et al.  Cleaning Up the Paper Trail , 2006, Science.

[49]  Paul A. David,et al.  Towards a cyberinfrastructure for enhanced scientific collaboration: Providing its 'soft' foundations may be the hardest part , 2006 .

[50]  Noel Enyedy,et al.  Building Digital Libraries for Scientific Data: An Exploratory Study of Data Practices in Habitat Ecology , 2006, ECDL.

[51]  Helena Karasti,et al.  Enriching the Notion of Data Curation in E-Science: Data Managing and Information Infrastructuring in the Long Term Ecological Research (LTER) Network , 2006, Computer Supported Cooperative Work (CSCW).

[52]  Thomas A. Finholt,et al.  Tensions across the scales: planning infrastructure for the long-term , 2007, GROUP.

[53]  C. Borgman Scholarship in the Digital Age: Information, Infrastructure, and the Internet , 2007 .

[54]  Noel Enyedy,et al.  Little science confronts the data deluge: habitat ecology, embedded sensor networks, and digital libraries , 2007, International Journal on Digital Libraries.

[55]  Matthew S. Mayernik,et al.  Drowning in data: digital library architecture to support scientific use of embedded sensor networks , 2007, JCDL '07.

[56]  Ann Zimmerman,et al.  Not by metadata alone: the use of diverse forms of knowledge to locate data for reuse , 2007, International Journal on Digital Libraries.

[57]  Linda C. Smith,et al.  An Educational Program on Data Curation , 2007 .

[58]  Heather A. Piwowar,et al.  Sharing Detailed Research Data Is Associated with Increased Citation Rate , 2007, PloS one.

[59]  Jane Hunter,et al.  Provenance Explorer-a graphical interface for constructing scientific publication packages from provenance trails , 2007, International Journal on Digital Libraries.

[60]  Geoffrey C. Bowker,et al.  Organizing for Multidisciplinary Collaboration: The Case of the Geosciences Network , 2008 .

[61]  J. Kaiser Uncle Sam's Biomedical Archive Wants Your Papers , 2008, Science.

[62]  Heather A. Piwowar,et al.  Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers , 2008, PLoS medicine.

[63]  Christopher Kelty,et al.  Two Bits: The Cultural Significance of Free Software , 2008 .

[64]  G. Olson,et al.  Scientific Collaboration on the Internet , 2008 .

[65]  B. Library Patterns of information use and exchange: case studies of researchers in the life sciences , 2009 .

[66]  Christine L. Borgman,et al.  The Digital Future is Now: A Call to Action for the Humanities , 2009, Digit. Humanit. Q..

[67]  Allen H. Renear,et al.  Strategic Reading, Ontologies, and the Future of Scientific Publishing , 2009, Science.

[68]  John Wilbanks,et al.  I have seen the paradigm shift, and it is us , 2009, The Fourth Paradigm.

[69]  John Kunze,et al.  Preservation Is Not a Place , 2009, Int. J. Digit. Curation.

[70]  Carl Lagoze,et al.  The Value of New Scientific Communication Models for Chemistry , 2009 .

[71]  Martin Pilgram,et al.  Consultative Committee For Space Data Systems , 2009 .

[72]  Melissa H. Cragin,et al.  Constructing Data Curation Profiles , 2009, Int. J. Digit. Curation.

[73]  Jelena Kovacevic,et al.  Reproducible research in signal processing , 2009, IEEE Signal Process. Mag..

[74]  Victoria Stodden,et al.  The Legal Framework for Reproducible Scientific Research: Licensing and Copyright , 2009, Computing in Science & Engineering.

[75]  Victoria Stodden,et al.  Enabling Reproducible Research: Open Licensing for Scientific Innovation , 2009 .

[76]  Carole A. Goble,et al.  The impact of workflow tools on data-centric research , 2009, The Fourth Paradigm.

[77]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[78]  Gordon Bell,et al.  Beyond the Data Deluge , 2009, Science.

[79]  J. Young Physicists Set Plan in Motion to Change Publishing System. , 2009 .

[80]  Geoffrey C. Bowker,et al.  Towards a virtual organization for data cyberinfrastructure , 2009, JCDL '09.

[81]  Victoria Stodden,et al.  Reproducible Research , 2019, The New Statistics with R.

[82]  G. B. Dalrymple,et al.  Climate change and the integrity of science. , 2010, Science.

[83]  John H. Porter,et al.  A Brief History of Data Sharing in the U.S. Long Term Ecological Research Network , 2010 .

[84]  D. Rubinfeld Sustainable Economics for a Digital Planet: Ensuring Long-term Access to Digital Information , 2010 .

[85]  Michel Beaudouin-Lafon Open access to scientific publications , 2010, Commun. ACM.

[86]  M. Whitlock,et al.  The need for archiving data in evolutionary biology , 2010, Journal of evolutionary biology.

[87]  Michael Witt,et al.  Data sharing, small science and institutional repositories , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[88]  Matthew S. Mayernik,et al.  From artifacts to aggregations: Modeling scientific life cycles on the semantic Web , 2009, J. Assoc. Inf. Sci. Technol..

[89]  Matthew S. Mayernik,et al.  Digital libraries for scientific data discovery and reuse: from vision to practical reality , 2010, JCDL '10.

[90]  Simone Sacchi,et al.  Definitions of dataset in the scientific and technical literature , 2010, ASIST.

[91]  Eric C. Kansa,et al.  Googling the Grey: Open Data, Web Services, and Semantics , 2010 .

[92]  E. Aronova,et al.  Big Science and Big Data in Biology: From the International Geophysical Year through the International Biological Program to the Long Term Ecological Research (LTER) Network, 1957––Present , 2010 .

[93]  P. N. Edwards A Vast Machine: Computer Models, Climate Data, and the Politics of Global Warming , 2010 .

[94]  D. Kleppner Ensuring the integrity, accessibility, and stewardship of research data in the digital age , 2010 .

[95]  Ixchel M. Faniel,et al.  Reusing Scientific Data: How Earthquake Engineering Researchers Assess the Reusability of Colleagues’ Data , 2010, Computer Supported Cooperative Work (CSCW).

[96]  Paul T. Groth,et al.  Provenance XG Final Report , 2010 .

[97]  C. Haeussler Information-Sharing in Academia and the Industry: A Comparative Study , 2010 .

[98]  Christine L. Borgman,et al.  Research Data: Who Will Share What, with Whom, When, and Why? , 2010 .

[99]  Wendy W. Chapman,et al.  Public sharing of research datasets: A pilot study of associations , 2010, J. Informetrics.

[100]  Michael J. Zigmond,et al.  The Essential Nature of Sharing in Science , 2010, Sci. Eng. Ethics.

[101]  J. Couzin-Frankel Cancer research. As questions grow, Duke halts trials, launches investigation. , 2010, Science.

[102]  Florence Millerand,et al.  Infrastructure Time: Long-term Matters in Collaborative Development , 2010, Computer Supported Cooperative Work (CSCW).

[103]  Laura Wynholds,et al.  Linking to Scientific Data: Identity Problems of Unruly and Poorly Bounded Digital Objects , 2011, Int. J. Digit. Curation.

[104]  Christine L Borgman,et al.  Science friction: Data, metadata, and collaboration , 2011, Social studies of science.

[105]  B. Santer,et al.  The Reproducibility of Observational Estimates of Surface and Atmospheric Temperature Change , 2011, Science.

[106]  M. Tomasello,et al.  Methodological Challenges in the Study of Primate Cognition , 2011, Science.

[107]  Xiao-Li Meng Multi-Party Inference and Uncongeniality , 2011, International Encyclopedia of Statistical Science.

[108]  B. Jasny,et al.  Again, and Again, and Again … , 2011 .

[109]  J. Overpeck,et al.  Climate Data Challenges in the 21st Century , 2011, Science.

[110]  Matthew S. Mayernik,et al.  Metadata Realities for Cyberinfrastructure: Data Authors as Metadata Creators , 2011 .

[111]  Matthew S. Mayernik,et al.  How institutional factors influence the creation of scientific metadata , 2011, iConference '11.

[112]  John P A Ioannidis,et al.  Improving Validation Practices in “Omics” Research , 2011, Science.

[113]  M. Ryan Replication in Field Biology: The Case of the Frog-Eating Bat , 2011, Science.

[114]  Anthony J. G. Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery [Point of View] , 2011 .

[115]  Bruce Alberts,et al.  Making Data Maximally Available , 2011, Science.

[116]  D. Boyd,et al.  Six Provocations for Big Data , 2011 .

[117]  A. Costello,et al.  Global health and climate change: moving from denial and catastrophic fatalism to positive action , 2011, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[118]  Christine L. Borgman,et al.  When use cases are not useful: data practices, astronomy, and digital libraries , 2011, JCDL '11.

[119]  M. Whitlock Data archiving in ecology and evolution: best practices. , 2011, Trends in ecology & evolution.

[120]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[121]  G. Naik Scientists' Elusive Goal: Reproducing Study Results , 2011 .

[122]  S. Djorgovski,et al.  THE DISCOVERY AND NATURE OF THE OPTICAL TRANSIENT CSS100217:102913+404220 , 2011, 1103.5514.

[123]  Christine L Borgman,et al.  Why are the attribution and citation of scientific data important? In: Uhlir, Paul and Cohen, Daniel (eds.). Report from Developing Data Attribution and Citation Practices and Standards: An International Symposium and Workshop. , 2012 .

[124]  S. Djorgovski,et al.  Sky Surveys , 2012, 1203.5111.

[125]  G. Alter Response to RFI: 'Public Access to Digital Data Resulting From Federally Funded Scientific Research' Office of Science and Technology Policy , 2012 .

[126]  Inês Ferreira dos Santos Videira,et al.  Mechanisms regulating melanogenesis* , 2013, Anais brasileiros de dermatologia.