Managing Research Data in Big Science

The project which led to this report was funded by JISC in 2010--2011 as part of its 'Managing Research Data' programme, to examine the way in which Big Science data is managed, and produce any recommendations which may be appropriate. Big science data is different: it comes in large volumes, and it is shared and exploited in ways which may differ from other disciplines. This project has explored these differences using as a case-study Gravitational Wave data generated by the LSC, and has produced recommendations intended to be useful variously to JISC, the funding council (STFC) and the LSC community. In Sect. 1 we define what we mean by 'big science', describe the overall data culture there, laying stress on how it necessarily or contingently differs from other disciplines. In Sect. 2 we discuss the benefits of a formal data-preservation strategy, and the cases for open data and for well-preserved data that follow from that. This leads to our recommendations that, in essence, funders should adopt rather light-touch prescriptions regarding data preservation planning: normal data management practice, in the areas under study, corresponds to notably good practice in most other areas, so that the only change we suggest is to make this planning more formal, which makes it more easily auditable, and more amenable to constructive criticism. In Sect. 3 we briefly discuss the LIGO data management plan, and pull together whatever information is available on the estimation of digital preservation costs. The report is informed, throughout, by the OAIS reference model for an open archive.

[1]  G. Bowker,et al.  An International Framework to Promote Access to Data , 2004, Science.

[2]  J. Lehtinen,et al.  DID THE ANCIENT EGYPTIANS RECORD THE PERIOD OF THE ECLIPSING BINARY ALGOL—THE RAGING ONE? , 2012, 1204.6206.

[3]  Deborah L. McGuinness,et al.  Ontology-supported scientific data frameworks: The Virtual Solar-Terrestrial Observatory experience , 2009, Comput. Geosci..

[4]  Harold Maurice Collins,et al.  LIGO becomes big science , 2003 .

[5]  Rainer Beck,et al.  Square kilometre array , 2010, Scholarpedia.

[6]  Christopher A. Lee,et al.  Taking Context Seriously: A Framework for Contextual Information in Digital Collections , 2007 .

[7]  David M. South,et al.  Data preservation in High Energy Physics , 2011, ArXiv.

[8]  B. Schlesinger,et al.  Definition of the Flexible Image Transport System (FITS) , 2001 .

[9]  Will Venters,et al.  Distributed large-scale systems development: Exploring the collaborative development of the particle physics Grid , 2009 .

[10]  Matthew Pitkin,et al.  Gravitational Wave Detection by Interferometry (Ground and Space) , 2000, Living reviews in relativity.

[11]  NaorDalit,et al.  The need for preservation aware storage , 2007 .

[12]  Ccsds Secretariat,et al.  Reference Model for an Open Archival Information System (OAIS) , 1999 .

[13]  A. Aaboe,et al.  Scientific astronomy in antiquity , 1974, Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences.

[14]  Nigel Stanger,et al.  Keeping research data safe , 2009 .

[15]  Harold Maurice Collins,et al.  Tacit Knowledge, Trust and the Q of Sapphire , 2001 .

[16]  Max Boisot,et al.  Collisions and Collaboration , 2011 .

[17]  Jean Kovalevsky,et al.  Astronomical Applications of Astrometry: The Hipparcos and Tycho Catalogues , 2008 .

[18]  Li Lin,et al.  The Life3 Predictive Costing Tool for Digital Collections , 2010, iPRES.

[19]  L. V. Morrison,et al.  Long-term fluctuations in the Earth’s rotation: 700 BC to AD 1990 , 1995, Philosophical Transactions of the Royal Society of London. Series A: Physical and Engineering Sciences.

[20]  Vicky Reich,et al.  Requirements for Digital Preservation Systems: A Bottom-Up Approach , 2005, D Lib Mag..

[21]  A. Aaboe,et al.  Babylonian mathematics, astrology, and astronomy , 1992 .

[22]  J. Edwards,et al.  Rethinking Expertise , 2008 .

[23]  A. Curry,et al.  Rescue of old data offers lesson for particle physicists. , 2011, Science.

[24]  Russell Hobson,et al.  The exact transmission of texts in the first millennium BCE - an examination of the cuneiform evidence from Mesopotamia and the Torah scrolls from the western shore of the Dead Sea , 2009 .

[25]  Carole A. Goble,et al.  Why Linked Data is Not Enough for Scientists , 2010, 2010 IEEE Sixth International Conference on e-Science.

[26]  Raivo Ruusalepp A comparative study of international approaches to enabling the sharing of research data , 2008 .

[27]  Historical lessons, inter-disciplinary comparison, and their application to the future evolution of the ESO Archive Facility and Archive Services , 2009 .

[28]  G. J. Toomer,et al.  A Survey of the Toledan Tables , 1968, Osiris.

[29]  S. O. Physics,et al.  The SuperCOSMOS Sky Survey – I. Introduction and description , 2001, astro-ph/0108286.

[30]  M. Stanley Gravity's Shadow: The Search for Gravitational Waves , 2005 .

[31]  Brian F. Lavoie The Open Archival Information System Reference Model: Introductory Guide , 2004 .

[32]  Francoise Genova,et al.  The CDS information hub , 2000, astro-ph/0002095.

[33]  Derek Jones,et al.  The scientific value of the Carte du Ciel , 2000 .

[34]  Clive G. Page,et al.  Definition of the Flexible Image Transport System (FITS), version 3.0 , 2010 .

[35]  Veerle Van den Eynden,et al.  Data Management Practices in the Social Sciences , 2010 .

[36]  Norman Gray,et al.  Digital Preservation and Astronomy: Lessons for funders and the funded , 2011 .

[37]  A. H. Ball REVIEW OF THE STATE OF THE ART OF THE DIGITAL CURATION OF RESEARCH DATA , 2010 .

[38]  F. Ochsenbein,et al.  The VizieR database of astronomical catalogues , 2000, astro-ph/0002122.