A Structural-Lexical Measure of Semantic Similarity for Geo-Knowledge Graphs

Graphs have become ubiquitous structures to encode geographic knowledge online. The Semantic Web’s linked open data, folksonomies, wiki websites and open gazetteers can be seen as geo-knowledge graphs, that is labeled graphs whose vertices represent geographic concepts and whose edges encode the relations between concepts. To compute the semantic similarity of concepts in such structures, this article defines the network-lexical similarity measure (NLS). This measure estimates similarity by combining two complementary sources of information: the network similarity of vertices and the semantic similarity of the lexical definitions. NLS is evaluated on the OpenStreetMap Semantic Network, a crowdsourced geo-knowledge graph that describes geographic concepts. The hybrid approach outperforms both network and lexical measures, obtaining very strong correlation with the similarity judgments of human subjects.

[1]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[2]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[3]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[4]  Michela Bertolotto,et al.  The semantic similarity ensemble , 2013, J. Spatial Inf. Sci..

[5]  Hongyan Liu,et al.  Fast Single-Pair SimRank Computation , 2010, SDM.

[6]  Marie-Laure Mugnier,et al.  Graph-based Knowledge Representation - Computational Foundations of Conceptual Graphs , 2008, Advanced Information and Knowledge Processing.

[7]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[8]  Samuel Fernando,et al.  A Semantic Similarity Approach to Paraphrase Detection , 2008 .

[9]  Ted Pedersen,et al.  Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts , 2006 .

[10]  Carlo Tasso,et al.  Evaluating the Results of Methods for Computing Semantic Relatedness , 2013, CICLing.

[11]  Peter D. Turney Similarity of Semantic Relations , 2006, CL.

[12]  Xuemin Lin,et al.  Taming Computational Complexity: Efficient and Parallel SimRank Optimizations on Undirected Graphs , 2010, WAIM.

[13]  Carsten Keßler,et al.  Similarity Measurement in Context , 2007, CONTEXT.

[14]  Christiane Fellbaum,et al.  Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms , 1998 .

[15]  Krzysztof Janowicz,et al.  Algorithm, Implementation and Application of the SIM-DL Similarity Server , 2007, GeoS.

[16]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[17]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[18]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[19]  Michael Healy,et al.  Theory and Applications of Ontology: Computer Applications , 2010 .

[20]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[21]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[22]  Heiner Stuckenschmidt,et al.  Ontology Alignment Evaluation Initiative: Six Years of Experience , 2011, J. Data Semant..

[23]  Michela Bertolotto,et al.  A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web , 2014, Quality Issues in the Management of Web Information.

[24]  Rada Mihalcea,et al.  Measuring the Semantic Similarity of Texts , 2005, EMSEE@ACL.

[25]  M. Dolores del Castillo,et al.  SyMSS: A syntax-based measure for short-text semantic similarity , 2011, Data Knowl. Eng..

[26]  Yizhou Sun,et al.  P-Rank: a comprehensive structural similarity measure over information networks , 2009, CIKM.

[27]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[28]  Christopher B. Jones,et al.  Geographic Information Retrieval , 2011, SIGSPACIAL.

[29]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[30]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[31]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[32]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[33]  Michela Bertolotto,et al.  Computing the semantic similarity of geographic terms using volunteered lexical definitions , 2013, Int. J. Geogr. Inf. Sci..

[34]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[35]  Angela Schwering,et al.  Approaches to Semantic Similarity Measurement for Geo‐Spatial Data: A Survey , 2008, Trans. GIS.

[36]  Michela Bertolotto,et al.  An evaluative baseline for geo-semantic relatedness and similarity , 2014, GeoInformatica.

[37]  Michela Bertolotto,et al.  Geographic knowledge extraction and semantic similarity in OpenStreetMap , 2013, Knowledge and Information Systems.

[38]  M. Goodchild Citizens as sensors: the world of volunteered geography , 2007 .

[39]  Max J. Egenhofer,et al.  Comparing geospatial entity classes: an asymmetric and context-dependent similarity measure , 2004, Int. J. Geogr. Inf. Sci..

[40]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[41]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .