The METACLIP semantic provenance framework for climate products

Abstract Having an effective way of dealing with data provenance is a necessary condition to ensure reproducibility, helping to build trust and credibility in research outcomes and the data products delivered. METACLIP (METAdata for CLImate Products) is a language-independent framework envisaged to tackle the problem of climate product provenance description. The solution is based on semantics exploiting the web standard Resource Description Framework (RDF), building on domain-specific extensions of standard vocabularies (e.g., PROV-O) describing the different aspects involved in climate product generation. We illustrate METACLIP through an example application within the open source R computing environment, generating a climate product for which full provenance information is recorded. Finally, the METACLIP Interpreter, a web-based interactive front-end for metadata visualization is presented, helping a diversity of users with different levels of expertise to trace and understand the provenance of a wide variety of climate data products, and to fully reproduce them.

[1]  Jan-Peter Muller,et al.  QA4ECV: A robust quality assurance service for terrestrial and atmospheric ECVs and ECV precursors , 2018 .

[2]  Anders Moberg,et al.  Daily dataset of 20th‐century surface air temperature and precipitation series for the European Climate Assessment , 2002 .

[3]  José Manuel Gutiérrez,et al.  The ECOMS User Data Gateway: Towards seasonal forecast data provision and research reproducibility in the era of Climate Services , 2017 .

[4]  A. Casanueva,et al.  Seasonal predictions of Fire Weather Index: Paving the way for their operational applicability in Mediterranean Europe , 2017 .

[5]  Peng Yue,et al.  Advancing interoperability of geospatial data provenance on the web: Gap analysis and strategies , 2018, Comput. Geosci..

[6]  P. Jones,et al.  A European daily high-resolution gridded data set of surface temperature and precipitation for 1950-2006 , 2008 .

[7]  Paul T. Groth,et al.  The rationale of PROV , 2015, J. Web Semant..

[8]  Antonio S. Cofiño,et al.  The R-based climate4R open framework for reproducible climate data access and post-processing , 2019, Environ. Model. Softw..

[9]  José Manuel Gutiérrez,et al.  An R package to visualize and communicate uncertainty in seasonal climate prediction , 2018, Environ. Model. Softw..

[10]  Mark A. Musen,et al.  The protégé project: a look back and a look forward , 2015, SIGAI.

[11]  Craig F. Smith,et al.  Resource Description Framework , 2006 .

[12]  A. Waple,et al.  Providing Global Change Information for Decision-Making: Capturing and Presenting Provenance , 2014 .

[13]  José M. Gutiérrez,et al.  VALUE: A framework to validate downscaling approaches for climate change studies , 2015 .

[14]  Jon Blower,et al.  A data model of the Climate and Forecast metadata conventions (CF-1.6) with a software implementation (cf-python v2.1) , 2017 .

[15]  Boris Motik,et al.  Optimising Ontology Classification , 2010, International Semantic Web Conference.

[16]  Robert R. Downs,et al.  The importance of data set provenance for science , 2015 .

[17]  Sean Bechhofer,et al.  SKOS Simple Knowledge Organization System Reference , 2009 .

[18]  Andreas Harth,et al.  Weaving the Pedantic Web , 2010, LDOW.

[19]  Jinguang Zheng,et al.  Ontology engineering in provenance enablement for the National Climate Assessment , 2014, Environ. Model. Softw..

[20]  Benoit Hingray,et al.  An intercomparison of a large ensemble of statistical downscaling methods over Europe: Results from the VALUE perfect predictor cross‐validation experiment , 2016 .

[21]  Huan Liu,et al.  Resource description framework: metadata and its applications , 2001, SKDD.

[22]  Neville Nicholls,et al.  Clivar/GCOS/WMO Workshop on Indices and Indicators for Climate Extremes Workshop Summary , 1999 .

[23]  Mark B. Sandler,et al.  Evaluation of the Music Ontology Framework , 2012, ESWC.

[24]  Cecelia DeLuca,et al.  Capturing and Sharing Our Collective Expertise on Climate Data: The CHARMe Project , 2014 .

[25]  J. Muller,et al.  Quality‐assured long‐term satellite‐based leaf area index product , 2017, Global change biology.

[26]  José Manuel Gutiérrez,et al.  Tackling Uncertainties of Species Distribution Model Projections with Package mopa , 2018, R J..

[27]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[28]  Ulrich Brosch Dublin Core Metadata , 2006 .

[29]  Andreas Harth,et al.  Scalable Authoritative OWL Reasoning for the Web , 2009, Int. J. Semantic Web Inf. Syst..

[30]  Jesús Fernández,et al.  The ECOMS User Data Gateway: homogeneous seasonal to decadal forecast data access for end users , 2014 .

[31]  Russ Rew,et al.  NetCDF: an interface for scientific data access , 1990, IEEE Computer Graphics and Applications.

[32]  Chris Hewitt,et al.  The Global Framework for Climate Services , 2012 .

[33]  Deborah L. McGuinness,et al.  Provenance Representation for the National Climate Assessment in the Global Change Information System , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[34]  Andreas Harth,et al.  Scalable Authoritative OWL Reasoning for the Web , 2011, Semantic Services, Interoperability and Web Applications.

[35]  Xiaogang Ma,et al.  Capturing provenance of global change information , 2014 .

[36]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[37]  Avigdor Gal Ontology Engineering , 2009, Encyclopedia of Database Systems.