The analytic potential of scientific data: Understanding re-use value

While problems related to the curation and preservation of scientific data are receiving considerable attention from the information science and digital repository communities, relatively little progress has been made on approaches for evaluating the value of data to inform investment in acquisition, curation, and preservation. Adapting Hjorland's concept of the “epistemological potential” of documents, we assert that analytic potential, or the value of data for analysis beyond its original use, should guide development of data collections for repositories aimed at supporting research. Three key aspects of the analytic potential of data are identified and discussed: preservation readiness, potential user communities, and fit for purpose. Based on evidence from research from the Data Conservancy initiative, we demonstrate how the analytic potential of data can be determined and applied to build large-scale data collections suited for grand challenge science.

[1]  Abdollah Homaifar,et al.  Detecting Environmental Change Using Self-Organizing Map Techniques Applied to the ERA-40 Database , 2011, Data Sci. J..

[2]  Birger Hjørland,et al.  Theory and metatheory of information science: a new interpretation , 1998, J. Documentation.

[3]  Ann Zimmerman,et al.  Not by metadata alone: the use of diverse forms of knowledge to locate data for reuse , 2007, International Journal on Digital Libraries.

[4]  Mikaela Sundberg,et al.  The dynamics of coordinated comparisons: How simulationists in astrophysics, oceanography and meteorology create standards for results , 2011 .

[5]  Michael Witt,et al.  Data sharing, small science and institutional repositories , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[6]  Ccsds Secretariat,et al.  Reference Model for an Open Archival Information System (OAIS) , 1999 .

[7]  Cory P. Knobel,et al.  Understanding Infrastructure: Dynamics, Tensions, and Design , 2007 .

[8]  Rlg Oclc,et al.  Trusted Digital Repositories: Attributes and Responsibilities , 2002 .

[9]  Reginald Aldworth Daly,et al.  Igneous rocks and the depths of the earth : containing some revised chapters of "Igneous rocks and their origin" (1914) , 1934 .

[10]  Chaomei Chen,et al.  Information seeking and subject representation: An activity‐theoretical approach to information science , 1998 .

[11]  Carole L. Palmer,et al.  Purposeful Curation: Research and Education for a Future with Working Data , 2008 .

[12]  M. Whitlock,et al.  The need for archiving data in evolutionary biology , 2010, Journal of evolutionary biology.

[13]  Geoffrey C. Bowker,et al.  Promoting Access to Public Research Data for Scientific, Economic, and Social Development , 2004, Data Sci. J..

[14]  Robert J. Hanisch,et al.  The Data Conservancy: Building a Sustainable System for Interdisciplinary Scientific Data Curation and Preservation , 2009 .

[15]  Reginald Aldworth Daly,et al.  Igneous Rocks and their Origin , 1914, Nature.

[16]  Geoffrey C. Bowker,et al.  Information ecology: open system environment for data, memories, and knowing , 2007, Journal of Intelligent Information Systems.

[17]  Birger Hjørland,et al.  Information Seeking and Subject Representation: An Activity-Theoretical Approach to Information Science , 1997 .

[18]  Carole L. Palmer,et al.  Units of evidence for analyzing subdisciplinary difference in data practice studies , 2011, JCDL '11.

[19]  Ixchel M. Faniel,et al.  Reusing Scientific Data: How Earthquake Engineering Researchers Assess the Reusability of Colleagues’ Data , 2010, Computer Supported Cooperative Work (CSCW).

[20]  Martin Pilgram,et al.  Consultative Committee For Space Data Systems , 2009 .

[21]  Lynn Yarmey,et al.  Data Stewardship: Environmental Data Curation and a Web-of-Repositories , 2009, Int. J. Digit. Curation.

[22]  Noel Enyedy,et al.  Little science confronts the data deluge: habitat ecology, embedded sensor networks, and digital libraries , 2007, International Journal on Digital Libraries.

[23]  Michael C Whitlock,et al.  Data Archiving , 2010, The American Naturalist.

[24]  Melissa H. Cragin Scientific Data Collections: Use in Scholarly Communication and Implications for Data Curation , 2006 .

[25]  Victoria C. Stodden Cyber Science and Engineering: A Report of the National Science Foundation Advisory Committee for Cyberinfrastructure Task Force on Grand Challenges , 2010 .

[26]  Joy Davidson,et al.  Selection, Appraisal, and Retention of Digital Scientific Data: Highlights of an ERPANET/CODATA Workshop , 2004, Data Sci. J..

[27]  Tiffany C. Chao Disciplinary reach: Investigating the impact of dataset reuse in the earth sciences , 2011, ASIST.

[28]  M. Bates The invisible substrate of information science , 1999 .

[29]  Paul F. Uhlir Information Gulags, Intellectual Straightjackets, and Memory Holes , 2010, Data Sci. J..

[30]  Va Arlington National Science Board. , 2010 .

[31]  P. Bryan Heidorn,et al.  Shedding Light on the Dark Data in the Long Tail of Science , 2008, Libr. Trends.

[32]  Stephen Hilgartner,et al.  Data Withholding in Genetics and the Other Life Sciences: Prevalences and Predictors , 2006, Academic medicine : journal of the Association of American Medical Colleges.

[33]  Christine L. Borgman,et al.  Research Data: Who Will Share What, with Whom, When, and Why? , 2010 .

[34]  Ruth E. Duerr,et al.  Designating user communities for scientific data: challenges and solutions , 2005, Data Sci. J..

[35]  Bruce D. Marsh,et al.  A magmatic mush column rosetta stone: The McMurdo Dry Valleys of Antarctica , 2004 .

[36]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[37]  Terry Eastwood Appraising digital records for long-term preservation , 2004, Data Sci. J..