Analysis of quality metadata in the GEOSS Clearinghouse

The Global Earth Observation System of Systems (GEOSS) Clearinghouse is part of the GEOSS Common Infrastructure (GCI) that supports the discovery of the data made available by the Group on Earth Observations (GEO) members and participant organizations in GEOSS. It also acts as a unified metadata catalogue that stores complete metadata records, not only about datasets but also for other kinds of components and services. By exploring these records, users often try to find the fit-for-use data. Quality indicators and provenance are included in the metadata and are potentially useful variables that allow users to make an informed decision avoiding to download and to assess the data themselves. However, no previous studies have been made on the completeness and correctness of the metadata records in the Clearinghouse. The objective of this paper is to analyze the data quality information distributed by the GEOSS Clearinghouse. The aim is to quantify its completeness and to provide clues on how the current status of the Clearinghouse could be improved and how useful quality aware tools could be. The methodology used in the current analysis consists in first harvesting of the Clearinghouse and then quantify the quality information found in 97203 metadata records, by using a semi-automatic approach. The results reveal that the inclusion of quality information on metadata records is not rare: 19.66% of the metadata records contain some quality element. However, this is not general enough and several aspects could be improved. For instance, 77.78% of quantitative measures lack measure units. When quality indicators are not sufficient, the lineage metadata information could be used to mitigate this situation by analysing the process steps and sources used to create a dataset. However, even though lineage is reported in 15.55% of the records, only 1.27% of the cases return a complete list of process steps with sources. This paper also provides indications on what is lacking in the current producer metadata model and, detected a gap in usage or user feedback metadata in GEOSS. Moreover, information extracted from GeoViQua interviews with users indicates that they value informal comments and user feedback on datasets as a complement of the more formal producer-oriented metadata description of the data. Although, many efforts within the scientific community and the Quality Assurance Framework for Earth Observation (QA4EO) group have been invested in describing how to parameterize data quality and uncertainty, we conclude that still extra work can be done to provide complete quality information in the metadata catalogues. In brief, since the GEOSS Clearinghouse references data from the most important agencies and research organizations, the results presented in this paper provide a perspective on how well quality is disseminated in the Earth observation community in general.

[1]  Xavier Pons,et al.  Preliminary considerations about the assessment and visualisation of the quality on geometric correctio ns of satellite imagery depending on the number of ground control points , 2011 .

[2]  Jose Achache,et al.  The global Earth observation system of systems , 2010 .

[3]  Jan Růžička,et al.  ISO 19115 for GeoWeb services orchestration , 2009 .

[4]  James K. Batcheller Automating geospatial metadata generation - An integrated data management and documentation approach , 2008, Comput. Geosci..

[5]  G. Chander,et al.  Terrestrial reference standard sites for postlaunch sensor calibration , 2010 .

[6]  Liping Di,et al.  A taxonomy of geospatial services for global service discovery and interoperability , 2009, Comput. Geosci..

[7]  David M. Mark,et al.  Next-Generation Digital Earth: A position paper from the Vespucci Initiative for the Advancement of Geographic Information Science , 2008, Int. J. Spatial Data Infrastructures Res..

[8]  M. White,et al.  Measuring and comparing the accuracy of species distribution models with presence–absence data , 2011 .

[9]  Daniel Allen Information for all. , 2007, Mental health today.

[10]  J. Masó,et al.  Comparative Quality Assessment of Metadata . Two Regional SDI case studies ( IDEC & IDE-CLM ) , 2010 .

[11]  Max Craglia,et al.  EuroGEOSS: building inter-disciplinary interoperability for the global community , 2010 .

[12]  Athanasios Manitsaris,et al.  A Conceptual Framework for Metadata Quality Assessment , 2008, Dublin Core Conference.

[13]  M. Goodchild,et al.  Sharing Geographic Information: An Assessment of the Geospatial One-Stop , 2007 .

[14]  Diane I. Hillmann,et al.  The Continuum of Metadata Quality: Defining, Expressing, Exploiting , 2004 .

[15]  Eliot J. Christian GEOSS Architecture Principles and the GEOSS Clearinghouse , 2008, IEEE Systems Journal.

[16]  N. Cressie,et al.  Spatial Statistics in the Presence of Location Error with an Application to Remote Sensing of the Environment , 2003 .

[17]  Wolfgang and Greenberg Jane Klas,et al.  Metadata for semantic and social applications , 2008 .

[18]  R. Devillers,et al.  Multidimensional Management of Geospatial Data Quality Information for its Dynamic Use Within GIS , 2005 .

[19]  Besiki Stvilia,et al.  Value-based metadata quality assessment , 2008 .

[20]  Javier Nogueras-Iso,et al.  On the Problem of Identifying the Quality of Geographic Metadata , 2006, ECDL.

[21]  Marc P. Armstrong,et al.  Assessing the effect of attribute uncertainty on the robustness of choropleth map classification , 2007, Int. J. Geogr. Inf. Sci..

[22]  Michael F. Goodchild,et al.  BEYOND METADATA: TOWARDS USER-CENTRIC DESCRIPTION OF DATA QUALITY , 2007 .

[23]  Terence L. van Zyl,et al.  The Sensor Web: systems of sensor systems , 2009, Int. J. Digit. Earth.

[24]  Andreas Donaubauer,et al.  Towards a Quality Aware Web Processing Service , 2008 .

[25]  William E. Moen,et al.  Assessing metadata quality: findings and methodological considerations from an evaluation of the US Government Information Locator Service (GILS) , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.