Similarity Measurement of Metadata of Geospatial Data: An Artificial Neural Network Approach

To help users discover the most relevant spatial datasets in the ever-growing global spatial data infrastructures (SDIs), a number of similarity measures of geospatial data based on metadata have been proposed. Researchers have assessed the similarity of geospatial data according to one or more characteristics of the geospatial data. They created different similarity algorithms for each of the selected characteristics and then combined these elementary similarities to the overall similarity of the geospatial data. The existing combination methods are mainly linear and may not be the most accurate. This paper reports our experiences in attempting to learn the optimal non-linear similarity integration functions, from the knowledge of experts, using an artificial neural network. First, a multiple-layer feed forward neural network (MLFFN) was created. Then, the intrinsic characteristics were used to represent the metadata of geospatial data and the similarity algorithms for each of the intrinsic characteristics were built. The training and evaluation data of MLFFN were derived from the knowledge of domain experts. Finally, the MLFFN was trained, evaluated, and compared with traditional linear combination methods, which was mainly a weighted sum. The results show that our method outperformed the existing methods in terms of precision. Moreover, we found that the combination of elementary similarities of experts to the overall similarity of geospatial data was not linear.

[1]  Zhao Hongwei,et al.  Construction of Geospatial Metadata Association Network , 2016 .

[2]  Hao Jiang,et al.  Big Earth Data: a new challenge and opportunity for Digital Earth’s development , 2017, Int. J. Digit. Earth.

[3]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[4]  Ted Pedersen,et al.  Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts , 2006 .

[5]  Karen Coyle,et al.  Understanding Metadata and Its Purpose , 2005 .

[6]  Garrison W. Cottrell,et al.  Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[7]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[8]  Ian Flood,et al.  Neural Networks in Civil Engineering. I: Principles and Understanding , 1994 .

[9]  R. Hecht-Nielsen Kolmogorov''s Mapping Neural Network Existence Theorem , 1987 .

[10]  Xiangji Huang,et al.  Mining query-driven contexts for geographic and temporal search , 2013, Int. J. Geogr. Inf. Sci..

[11]  Cláudio de Souza Baptista,et al.  Improving geographic information retrieval in spatial data infrastructures , 2014, GeoInformatica.

[12]  David Fairbairn,et al.  Assessing similarity matching for possible integration of feature classifications of geospatial data from official and informal sources , 2012, Int. J. Geogr. Inf. Sci..

[13]  J. Goodwin,et al.  Geographical Linked Data: The Administrative Geography of Great Britain on the Semantic Web , 2008 .

[14]  Holger R. Maier,et al.  Neural networks for the prediction and forecasting of water resource variables: a review of modelling issues and applications , 2000, Environ. Model. Softw..

[15]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[16]  Michela Bertolotto,et al.  An evaluative baseline for geo-semantic relatedness and similarity , 2014, GeoInformatica.

[17]  A-Xing Zhu,et al.  A similarity-based automatic data recommendation approach for geographic models , 2017, Int. J. Geogr. Inf. Sci..

[18]  Linlin Ge,et al.  Learning Ranking Functions for Geographic Information Retrieval Using Genetic Programming , 2009, J. Res. Pract. Inf. Technol..

[19]  M Buscema,et al.  Back propagation neural networks. , 1998, Substance use & misuse.

[20]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[21]  Chao Sun,et al.  A Study on Meteorological Metadata Catalogue Service System: A Study on Meteorological Metadata Catalogue Service System , 2010 .

[22]  L. L. Rogers,et al.  Optimization of groundwater remediation using artificial neural networks with parallel solute transport modeling , 1994 .

[23]  Rada Mihalcea,et al.  Measuring the Semantic Similarity of Texts , 2005, EMSEE@ACL.

[24]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[25]  Aysen Demiroren,et al.  The application of ANN technique to automatic generation control for multi-area power system , 2002 .

[26]  Chaowei Phil Yang,et al.  Introduction to big geospatial data research , 2014, Ann. GIS.

[27]  M. Georgiopoulos,et al.  Feed-forward neural networks , 1994, IEEE Potentials.

[28]  A-Xing Zhu,et al.  Multidimensional and quantitative interlinking approach for Linked Geospatial Data , 2017, Int. J. Digit. Earth.

[29]  Lizhe Wang,et al.  A Comparison of Machine Learning Algorithms for Mapping of Complex Surface-Mined and Agricultural Landscapes Using ZiYuan-3 Stereo Satellite Imagery , 2016, Remote. Sens..

[30]  Heiko Paulheim,et al.  A Hybrid Multi-strategy Recommender System Using Linked Open Data , 2014, SemWebEval@ESWC.

[31]  Maureen Whalen,et al.  Introduction to Metadata , 2008 .

[32]  Sunita Yadav,et al.  Neural Network based Approach for Predicting User Satisfaction with Search Engine , 2011 .

[33]  M. Goodchild,et al.  Sharing Geographic Information: An Assessment of the Geospatial One-Stop , 2007 .

[34]  James M. LeBreton,et al.  The Restriction of Variance Hypothesis and Interrater Reliability and Agreement: Are Ratings from Multiple Sources Really Dissimilar? , 2003 .

[35]  J. Amini OPTIMUM LEARNING RATE IN BACK-PROPAGATION N EURAL NETWORK FOR CLASSIFICATION OF SATELLITE IMAGES (IRS-ID) , 2008 .

[36]  Javier Nogueras-Iso,et al.  Aggregation-based information retrieval system for geospatial data catalogs , 2017, Int. J. Geogr. Inf. Sci..

[37]  Andrew Trotman,et al.  Learning to Rank , 2005, Information Retrieval.

[38]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[39]  Shin'ichi Tamura,et al.  Capabilities of a four-layered feedforward neural network: four layers versus three , 1997, IEEE Trans. Neural Networks.

[40]  Michael F. Goodchild,et al.  Semantic similarity measurement based on knowledge mining: an artificial neural net approach , 2012, Int. J. Geogr. Inf. Sci..

[41]  L. James,et al.  rwg: An assessment of within-group interrater agreement. , 1993 .

[42]  S. Rehman,et al.  Artificial neural network estimation of global solar radiation using air temperature and relative humidity , 2008 .

[43]  Angela Schwering,et al.  A Hybrid Semantic Similarity Measure for Spatial Information Retrieval , 2009, Spatial Cogn. Comput..

[44]  Mário J. Silva,et al.  Indexing and ranking in Geo-IR systems , 2005, GIR '05.

[45]  Michael F. Goodchild,et al.  Towards geospatial semantic search: exploiting latent semantic relations in geospatial data , 2014, Int. J. Digit. Earth.

[46]  M. Kendall,et al.  The Problem of $m$ Rankings , 1939 .

[47]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.