Repeatability and Re-usability in Scientific Processes: Process Context, Data Identification and Verification

eScience offers huge potential of speeding up scientific discovery, being able to flexibly re-use, combine and build on top of existing tools and results. Yet, to reap the benefits we must be able to actually perform these activities, i.e. having the data, processing components etc. available for redeployment and being able to trust them. Thus, repeatability of e-Science experiments is a requirement of validating work to establish trust in results. This proves challenging as procedures currently in place are not set up to meet these goals. Several approaches have tackled this issue from various angles. This paper reviews these building blocks and ties them together. It starts from the capture and description of entire research processes and ways to document them. Regarding data, we review the recommendations of the Research Data Alliance on how to precisely identify arbitrary subsets of potentially high-volume and highly dynamic data used in a process. Last, we present mechanisms for verifying the correctness of process reexecutions.

[1]  Andreas Rauber,et al.  Scalable data citation in dynamic, large databases: Model and reference implementation , 2013, 2013 IEEE International Conference on Big Data.

[2]  Tomasz Miksa,et al.  Framework for Verification of Preserved and Redeployed Processes , 2013, iPRES.

[3]  Andreas Rauber,et al.  A Scalable Framework for Dynamic Data Citation of Arbitrary Structured Data , 2014, DATA.

[4]  Tomasz Miksa,et al.  Risk Driven Selection of Preservation Activities for Increasing Sustainability of Open Source Systems and Workflows , 2014, iPRES.

[5]  Yolanda Gil,et al.  A new approach for publishing workflows: abstractions, standards, and linked data , 2011, WORKS '11.

[6]  Andreas Rauber,et al.  Plato: A Preservation Planning Tool Integrating Preservation Action Services , 2008, ECDL.

[7]  David De Roure,et al.  Machines, methods and music: On the evolution of e-Research , 2011, HPCS.

[8]  Gonçalo Antunes,et al.  Preserving Scientific Processes from Design to Publications , 2012, TPDL.

[9]  Ralf Treinen,et al.  Description of the CUDF Format , 2008, ArXiv.

[10]  Tomasz Miksa,et al.  Resilient Web Services for Timeless Business Processes , 2014, iiWAS.

[11]  Leyla Jael García Castro,et al.  An open annotation ontology for science on web 3.0 , 2011, J. Biomed. Semant..

[12]  Carole A. Goble,et al.  Towards the Preservation of Scientific Workflows , 2011, iPRES.

[13]  Tomasz Miksa,et al.  VPlan - Ontology for Collection of Process Verification Data , 2014, iPRES.

[14]  Andreas Rauber,et al.  A Quantitative Study on the Re-executability of Publicly Shared Scientific Workflows , 2015, 2015 IEEE 11th International Conference on e-Science.

[15]  Marc M. Lankhorst,et al.  Enterprise Architecture at Work - Modelling, Communication and Analysis, 2nd Edition , 2005, The Enterprise Engineering Series.

[16]  Piotr Nowakowski,et al.  The Collage Authoring Environment , 2011, ICCS.

[17]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[18]  Andreas Rauber,et al.  Towards Time-resilient MIR Processes , 2012, ISMIR.

[19]  Andreas Rauber,et al.  A Measurement Framework for Evaluating Emulators for Digital Preservation , 2012, TOIS.

[20]  Sarah Higgins PREMIS Data Dictionary for Preservation Metadata , 2009 .

[21]  Tomasz Miksa,et al.  Ensuring sustainability of web services dependent processes , 2015, Int. J. Comput. Sci. Eng..

[22]  Joy Davidson,et al.  State of the Art of Cost and Benefit Models for Digital Curation , 2014 .

[23]  A. Curry,et al.  Rescue of old data offers lesson for particle physicists. , 2011, Science.

[24]  Oscar Corcho,et al.  Workflow-centric research objects: First class citizens in scholarly discourse. , 2012, ESWC 2012.

[25]  Ron Mengelers,et al.  The Effects of FreeSurfer Version, Workstation Type, and Macintosh Operating System Version on Anatomical Volume and Cortical Thickness Measurements , 2012, PloS one.

[26]  Tomasz Miksa,et al.  Process Management Plans , 2014, Int. J. Digit. Curation.

[27]  Carole A. Goble,et al.  The Evolution of myExperiment , 2010, 2010 IEEE Sixth International Conference on e-Science.

[28]  José Luis Borbinha,et al.  Using ontologies to capture the semantics of a (business) process for digital preservation , 2015, International Journal on Digital Libraries.

[29]  Boris Motik,et al.  OWL 2 Web Ontology Language: structural specification and functional-style syntax , 2008 .

[30]  J. Houghton,et al.  Economic impact evaluation of the economic and social data service , 2012 .

[31]  Brian Matthews,et al.  Enabling scientific data sharing and re-use , 2012, 2012 IEEE 8th International Conference on E-Science.