Semantic validation of environmental observations data

To facilitate data access, integration and analysis tasks, standardised markup languages are commonly used for communicating data from multiple sources. However, one issue with the use of standardised markup languages is that they generally fail to capture the constraints that are imposed on data contents. As such, semantic validation of data encoded in these languages becomes difficult. In this paper, we focus on one such language, the Water Data Transfer Format (WDTF), and present an approach for validation of WDTF data contents. We show that by using Semantic Web technologies, we can express those constraints not captured by WDTF and check their consistency with respect to data through a query reduction technique that allows existing reasoners to be used for data validation. Finally, we report on an experimental study of our approach and discuss some common errors that we found in data as exposed by the study. We propose a semantic approach for validation of environmental data encoded in Water Data Transfer Format (WDTF).We provide a detailed description of the validation process, which includes constructing an ontology for WDTF, representing data and constraints in OWL/SWRL, and validating data with respect to constraints through a query reduction technique.We evaluate our approach by implementing a prototype and testing it on real WDTF data.

[1]  Boris Motik,et al.  Bridging the gap between OWL and relational databases , 2007, WWW '07.

[2]  Jirka Kosek,et al.  Relaxed: on the way towards true validation of compound documents , 2006, WWW '06.

[3]  Simon Cox,et al.  Water Data Transfer Format (WDTF): Guiding principles, technical challenges and the future , 2009 .

[4]  Stijn Heymans,et al.  Semantic validation of the use of SNOMED CT in HL7 clinical documents , 2011, J. Biomed. Semant..

[6]  Boris Motik,et al.  A Faithful Integration of Description Logics with Logic Programming , 2007, IJCAI.

[7]  David Ratcliffe,et al.  A semantic approach to data translation: A case study of environmental observations data , 2015, Knowl. Based Syst..

[8]  Myungjin Lee,et al.  Semantic Web Constraint Language and its application to an intelligent shopping agent , 2009, Decis. Support Syst..

[9]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[10]  Jiao Tao,et al.  Integrity Constraints in OWL , 2010, AAAI.

[11]  Michael Piasecki,et al.  Engineering new paths to water data , 2009, Comput. Geosci..

[12]  Moshe Y. Vardi Why is Modal Logic So Robustly Decidable? , 1996, Descriptive Complexity and Finite Models.

[13]  Fabio Vitali,et al.  Schemapath, a minimal extension to xml schema for conditional constraints , 2004, WWW '04.

[14]  Jeffery S. Horsburgh,et al.  Managing a community shared vocabulary for hydrologic observations , 2014, Environ. Model. Softw..

[15]  Diego Calvanese,et al.  EQL-Lite: Effective First-Order Query Processing in Description Logics , 2007, IJCAI.

[16]  Laura Inés Furlong,et al.  Assessment of NER solutions against the first and second CALBC Silver Standard Corpus , 2011, Semantic Mining in Biomedicine.

[17]  Norbert Luttenberger,et al.  From UML to OWL 2 , 2011, KTW.

[18]  Sebastian Rudolph,et al.  Description Logic Rules , 2010, ECAI.

[19]  Simon J. D. Cox,et al.  Use of standard vocabulary services in validation of water resources data described in XML , 2011, Earth Sci. Informatics.

[20]  Jeffery S. Horsburgh,et al.  An integrated system for publishing environmental observations data , 2009, Environ. Model. Softw..

[21]  Peter F. Patel-Schneider,et al.  Ontology Constraints in Incomplete and Complete Data , 2012, International Semantic Web Conference.

[22]  Rahul Ramachandran,et al.  Earth Science Markup Language (ESML): : a solution for scientific data-application interoperability problem , 2004, Comput. Geosci..

[23]  Francesco M. Donini,et al.  Description logics of minimal knowledge and negation as failure , 2002, TOCL.