Data Management and Preservation Planning for Big Science

‘Big Science’ - that is, science which involves large collaborations with dedicated facilities, and involving large data volumes and multinational investments – is often seen as different when it comes to data management and preservation planning. Big Science handles its data differently from other disciplines and has data management problems that are qualitatively different from other disciplines. In part, these differences arise from the quantities of data involved, but possibly more importantly from the cultural, organisational and technical distinctiveness of these academic cultures. Consequently, the data management systems are typically and rationally bespoke, but this means that the planning for data management and preservation (DMP) must also be bespoke. These differences are such that ‘just read and implement the OAIS specification’ is reasonable Data Management and Preservation (DMP) advice, but this bald prescription can and should be usefully supported by a methodological ‘toolkit’, including overviews, case-studies and costing models to provide guidance on developing best practice in DMP policy and infrastructure for these projects, as well as considering OAIS validation, audit and cost modelling. In this paper, we build on previous work with the LIGO collaboration to consider the role of DMP planning within these big science scenarios, and discuss how to apply current best practice. We discuss the result of the MaRDI-Gross project (Managing Research Data Infrastructures – Big Science), which has been developing a toolkit to provide guidelines on the application of best practice in DMP planning within big science projects. This is targeted primarily at projects’ engineering managers, but intending also to help funders collaborate on DMP plans which satisfy the requirements imposed on them.

[1]  Vicky Reich,et al.  Requirements for Digital Preservation Systems: A Bottom-Up Approach , 2005, D Lib Mag..

[2]  Dirk Pilat,et al.  OECD Principles and Guidelines for Access to Research Data from Public Funding , 2007, Data Sci. J..

[3]  Kerstin Kleese van Dam,et al.  ICAT: Integrating Data Infrastructure for Facilities Based Science , 2009, 2009 Fifth IEEE International Conference on e-Science.

[4]  Data Preservation in High Energy Physics , 2009, 0912.0255.

[5]  Nigel Stanger,et al.  Keeping research data safe , 2009 .

[6]  Brian Lavoie,et al.  Keeping Research Data Safe 2: Final Report , 2010 .

[7]  Brian Matthews,et al.  A Framework for Software Preservation , 2010, Int. J. Digit. Curation.

[8]  Li Lin,et al.  The Life3 Predictive Costing Tool for Digital Collections , 2010, iPRES.

[9]  David Giaretta,et al.  Curating Scientific Research Data for the Long Term: A Preservation Analysis Method in Context , 2011, Int. J. Digit. Curation.

[10]  K. Cranmer,et al.  RECAST — extending the impact of existing analyses , 2010, 1010.2506.

[11]  David L. Giaretta Advanced Digital Preservation , 2011 .

[12]  Magenta Book,et al.  AUDIT AND CERTIFICATION OF TRUSTWORTHY DIGITAL REPOSITORIES , 2011 .

[13]  David M. South,et al.  Data preservation in High Energy Physics , 2011, ArXiv.

[14]  Norman Gray,et al.  Managing Research Data in Big Science , 2012, ArXiv.

[15]  Brian Matthews,et al.  DMP Planning for Big Science Projects , 2012, 1208.3754.

[16]  Ccsds Secretariat,et al.  Reference Model for an Open Archival Information System (OAIS) , 1999 .

[17]  John MacColl,et al.  RCUK Policy on Open Access: Compliance data report to Research Councils UK for the period 1 April 2013 to 31 July 2014 from the University of St Andrews , 2014 .