Beyond the Data Deluge: A Research Agenda for Large-Scale Data Sharing and Reuse

There is almost universal agreement that scientific data should be shared for use beyond the purposes for which they were initially collected. Access to data enables system-level science, expands the instruments and products of research to new communities, and advances solutions to complex human problems. While demands for data are not new, the vision of open access to data is increasingly ambitious. The aim is to make data accessible and usable to anyone, anytime, anywhere, and for any purpose. Until recently, scholarly investigations related to data sharing and reuse were sparse. They have become more common as technology and instrumentation have advanced, policies that mandate sharing have been implemented, and research has become more interdisciplinary. Each of these factors has contributed to what is commonly referred to as the "data deluge". Most discussions about increases in the scale of sharing and reuse have focused on growing amounts of data. There are other issues related to open access to data that also concern scale which have not been as widely discussed: broader participation in data sharing and reuse, increases in the number and types of intermediaries, and more digital data products. The purpose of this paper is to develop a research agenda for scientific data sharing and reuse that considers these three areas.

[1]  Marc Berg,et al.  The contextual nature of medical information , 1999, Int. J. Medical Informatics.

[2]  Jane M. Packard,et al.  Acceptance of scientific management by natural resource dependent communities , 1997 .

[3]  Carole A. Goble,et al.  myExperiment: Defining the Social Virtual Research Environment , 2008, 2008 IEEE Fourth International Conference on eScience.

[4]  Jeremy P. Birnholtz,et al.  Data at work: supporting sharing in science and engineering , 2003, GROUP.

[5]  Barbara P. Buttenfield,et al.  Digital Libraries and Collaborative Knowledge Construction , 2003 .

[6]  Ann Zimmerman,et al.  Not by metadata alone: the use of diverse forms of knowledge to locate data for reuse , 2007, International Journal on Digital Libraries.

[7]  Anne E. Trefethen,et al.  E-Science, Cyberinfrastructure, and Scholarly Communication , 2008 .

[8]  Sarah Higgins,et al.  The dcc curation lifecycle model , 2008, JCDL '08.

[9]  C. Rusbridge,et al.  The International Journal of Digital Curation , 2008 .

[10]  G. Olson,et al.  Scientific Collaboration on the Internet , 2008 .

[11]  Andrew C. Simpson,et al.  Collaboration and Trust in Healthcare Innovation: The eDiaMoND Case Study , 2005, Computer Supported Cooperative Work (CSCW).

[12]  N. House Digital libraries and practices of trust: Networked biodiversity information , 2002 .

[13]  S. Ross :Scholarship in the Digital Age: Information, Infrastructure, and the Internet , 2009 .

[14]  Ixchel M. Faniel,et al.  Reusing Scientific Data: How Earthquake Engineering Researchers Assess the Reusability of Colleagues’ Data , 2010, Computer Supported Cooperative Work (CSCW).

[15]  Va Arlington National Science Board. , 2010 .

[16]  Carole L. Palmer,et al.  Information Work at the Boundaries of Science: Linking Library Services to Research Practices , 1996, Libr. Trends.

[17]  Raym Crow,et al.  The case for institutional repositories : a SPARC position paper , 2002 .

[18]  Noel Enyedy,et al.  Little science confronts the data deluge: habitat ecology, embedded sensor networks, and digital libraries , 2007, International Journal on Digital Libraries.

[19]  Francine Berman,et al.  Grid Computing: Making the Global Infrastructure a Reality , 2003 .

[20]  S. Pierce Boundary Crossing in Research Literatures as a Means of Interdisciplinary Information Transfer , 1999, J. Am. Soc. Inf. Sci..

[21]  Melissa H. Cragin,et al.  Scientific Data Collections and Distributed Collective Practice , 2006, Computer Supported Cooperative Work (CSCW).

[22]  Paul Johns,et al.  Pathfinder: an online collaboration environment for citizen scientists , 2009, CHI.

[23]  J. Whitfield The budding amateurs , 2001, Nature.

[24]  Kalpana Shankar Order from chaos: The poetics and pragmatics of scientific recordkeeping , 2007 .

[25]  Nancy A. Vanhouse,et al.  Digital Library Use: Social Practice in Design and Evaluation , 2003 .

[26]  Ann Zimmerman,et al.  New Knowledge from Old Data , 2008 .

[27]  C. Bazerman Changing Order: Replication and Induction in Scientific Practice , 1989 .

[28]  Aaron Griffiths,et al.  The Publication of Research Data: Researcher Attitudes and Behaviour , 2009, Int. J. Digit. Curation.

[29]  Nancy A. Van House,et al.  Cooperative knowledge work and practices of trust: sharing environmental planning data sets , 1998, CSCW '98.

[30]  Julie Thompson Klein Interdisciplinary Needs: The Current Context , 1996, Libr. Trends.

[31]  Ann G. Green,et al.  Building partnerships among social science researchers, institution-based repositories and domain specific data archives , 2007, OCLC Syst. Serv..

[32]  Anne E. Trefethen,et al.  The Data Deluge: An e-Science Perspective , 2003 .

[33]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[34]  Sara B. Kiesler,et al.  Returns to science: computer networks in oceanography , 1993, CACM.

[35]  Ben Anderson,et al.  What Are Data? The Many Kinds of Data and Their Implications for Data Re-Use , 2007, J. Comput. Mediat. Commun..

[36]  M. Lynne Markus,et al.  Toward A Theory of Knowledge Reuse : Types of Knowledge Reuse Situations and Factors in Reuse Success , 2022 .