Facilitating improved access and integrated use of data - a case study using the AWRA-L dataset

Availability and publication of data is increasing greatly throughout the scientific community. However, discovery, access and use of the data remain a challenge – in particular, access in modelling and data environments at the user end. On the consumption of data end, scientists are still having to manually access, download, interpret, wrangle, process and clean data prior to any analysis steps relevant for the research activity. Thus access to and use of scientific data needs to be more seamless as research activities become more data-intensive. On the supply of data end, data publishers and providers are publishing data but through heterogeneous platforms, distributions channels, and in a wide range of formats and data models. There seems to be a gap between how data is accessed and how data is published. We argue in this paper that this gap ought to be narrowed. In this paper, we propose a methodology for assessing the quality of a dataset’s publication arrangement and implementing recommendations from an assessment. The assessment component uses a tool called the 5-star data self-assessment tool, which has been developed under the OzNome initiative. The tool implements concrete questions based on the FORCE11 FAIR data guiding principles. This is used in a case study looking at the Bureau of Meteorology’s AWRA-L data as a running example. Using the AWRA-L data, we present a summary of this assessment and candidate recommendations to address identified gaps. We then present a summary of implementations to address these gaps. We subsequently show how outputs of these implementations can be leveraged in the modeling environment for AWRA-L via an example using JupyterPython notebooks. This paper also explores specific tools and approaches for improving access and interoperability of datasets in the earth and environmental sciences domain, particularly gridded datasets, as part of examining improvements to recommended parts of the data supply chain. In particular, prior methods used in eReefs were implemented for AWRA-L to improve the binding of reference metadata, controlled vocabularies of the observable and modelled properties referenced, and the actual data. These leveraged tools and approaches such as Linked Data, a vocabulary registry, and web services. Application of these methods resulted in a set of AWRA-L reference metadata that were key components to the integration of data and the conceptual definitions of the modelled properties referenced by the AWRA-L data itself. The governance and operationalization of the AWRA-L reference metadata is being investigated for future work. The methodology presented in this paper serves as a general approach to assessing and monitoring the quality of a dataset’s delivery and access arrangement. It provides data providers with concrete steps that they can take towards improving data provision arrangements. It provides data users with information on the properties of a dataset and an indication of its provision arrangements.

[1]  Jonathan Yu,et al.  Water Quality Vocabulary Development and Deployment , 2013 .

[2]  Jonathan Yu,et al.  The eReefs data brokering layer for hydrological and environmental data , 2016 .

[3]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[4]  Donald A. Norman,et al.  User Centered System Design , 1986 .

[5]  Simon Cox,et al.  Enhancing Water Quality Data Service Discovery And Access Using Standard Vocabularies , 2014 .

[6]  Simon Cox,et al.  A Harmonized Vocabulary For Water Quality , 2014 .

[7]  M. Ragan-Kelley,et al.  The Jupyter/IPython architecture: a unified view of computational research, from interactive exploration to communication and publication. , 2014 .

[8]  Alistair Miles,et al.  SKOS: Simple Knowledge Organisation for the Web , 2007 .

[9]  Bruce Simons,et al.  Managing and publishing vocabularies using a generic semantic registry , 2016 .

[10]  Jin Teng,et al.  The Australian Water Resource Assessment Modelling System (AWRA) , 2013 .

[11]  Andrew Frost,et al.  The Bureau's Operational AWRA Modelling System in the context of Australian landscape and hydrological model products , 2015 .

[12]  Mohsin Hafeez,et al.  Australian Water Resources Assessment Modelling System (AWRAMS) - informing water resources assessment and national water accounting , 2015 .

[13]  Nicholas J. Car,et al.  Implementing a Glossary and Vocabulary Service in an Interdisciplinary Environmental Assessment for Decision Makers , 2015, ISESS.