A holistic approach to aligning geospatial data with multidimensional similarity measuring

ABSTRACT Semantically aligning the heterogeneous geospatial datasets (GDs) produced by different organizations demands efficient similarity matching methods. However, the strategies employed to align the schema (concept and property) and instances are usually not reusable, and the effects of unbalanced information tend to be neglected in GD alignment. To solve this problem, a holistic approach is presented in this paper to integrally align the geospatial entities (concepts, properties and instances) simultaneously. Spatial, lexical, structural and extensional similarity metrics are designed and automatically aggregated by means of approval voting. The presented approach is validated with real geographical semantic webs, Geonames and OpenStreetMap. Compared with the well-known extensional-based aligning system, the presented approach not only considers more information involved in GD alignment, but also avoids the artificial parameter setting in metric aggregation. It reduces the dependency on specific information, and makes the alignment more robust under the unbalanced distribution of various information.

[1]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[2]  Lise Getoor,et al.  Entity resolution in geospatial data integration , 2006, GIS '06.

[3]  Steffen Volz,et al.  Data-Driven Matching of Geospatial Schemas , 2005, COSIT.

[4]  A-Xing Zhu,et al.  Multidimensional and quantitative interlinking approach for Linked Geospatial Data , 2017, Int. J. Digit. Earth.

[5]  Michela Bertolotto,et al.  Grounding Linked Open Data in WordNet: The Case of the OSM Semantic Network , 2013, W2GIS.

[6]  Johanna Völker,et al.  Towards large-scale, open-domain and ontology-based named entity classification , 2005 .

[7]  Yi Li,et al.  RiMOM: A Dynamic Multistrategy Ontology Alignment Framework , 2009, IEEE Transactions on Knowledge and Data Engineering.

[8]  Alexandros Potamianos,et al.  Similarity computation using semantic networks created from web-harvested data , 2013, Natural Language Engineering.

[9]  Glen Hart,et al.  Matching Formal and Informal Geospatial Ontologies , 2013, AGILE Conf..

[10]  Max M. Louwerse,et al.  A Comparison of String Similarity Measures for Toponym Matching , 2013, COMP '13.

[11]  Angela Schwering,et al.  Semantic Similarity Measurement and Geospatial Applications , 2008, Trans. GIS.

[12]  Jinguang Zheng,et al.  SEM+: tool for discovering concept mapping in Earth science related domain , 2015, Earth Science Informatics.

[13]  Michela Bertolotto,et al.  A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web , 2014, Quality Issues in the Management of Web Information.

[14]  Michela Bertolotto,et al.  Linking geographic vocabularies through WordNet , 2014, Ann. GIS.

[15]  Yong Liu,et al.  Using Linked Data in a heterogeneous Sensor Web: challenges, experiments and lessons learned , 2013, Int. J. Digit. Earth.

[16]  Michela Bertolotto,et al.  A Structural-Lexical Measure of Semantic Similarity for Geo-Knowledge Graphs , 2015, ISPRS Int. J. Geo Inf..

[17]  Fausto Giunchiglia,et al.  Semantic Matching: Algorithms and Implementation , 2007, J. Data Semant..

[18]  Christoph Stasch,et al.  Semantic Enablement for Spatial Data Infrastructures , 2010, Trans. GIS.

[19]  Erhard Rahm,et al.  Schema and ontology matching with COMA++ , 2005, SIGMOD '05.

[20]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[21]  Michela Bertolotto,et al.  Computing the semantic similarity of geographic terms using volunteered lexical definitions , 2013, Int. J. Geogr. Inf. Sci..

[22]  Angela Schwering,et al.  Approaches to Semantic Similarity Measurement for Geo‐Spatial Data: A Survey , 2008, Trans. GIS.

[23]  Isabel F. Cruz,et al.  Automatic Background Knowledge Selection for Matching Biomedical Ontologies , 2014, PloS one.

[24]  Silvana Castano,et al.  An Algorithm and Implementation for GeoOntologies Alignment , 2007 .

[25]  Kiyun Yu,et al.  A new method for matching objects in two different geospatial datasets based on the geographic context , 2010, Comput. Geosci..

[26]  Hong-Gee Kim,et al.  Aligning ontologies with subsumption and equivalence relations in Linked Data , 2015, Knowl. Based Syst..

[27]  Jens Lehmann,et al.  LinkedGeoData: Adding a Spatial Dimension to the Web of Data , 2009, SEMWEB.

[28]  Ashok Samal,et al.  A feature-based approach to conflation of geospatial sources , 2004, Int. J. Geogr. Inf. Sci..

[29]  Xuanjing Huang,et al.  Map search via a factor graph model , 2013, CIKM.

[30]  Wenzhong Shi,et al.  A probability-based multi-measure feature matching method in map conflation , 2009 .

[31]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[32]  Xing Xie,et al.  Detecting nearly duplicated records in location datasets , 2010, GIS '10.

[33]  Heiner Stuckenschmidt,et al.  Analyzing Mapping Extraction Approaches , 2007, OM.

[34]  Craig A. Knoblock,et al.  Discovering Concept Coverings in Ontologies of Linked Data Sources , 2012, International Semantic Web Conference.

[35]  Javier Finat Codes,et al.  An evaluation of ontology matching techniques on geospatial ontologies , 2013, Int. J. Geogr. Inf. Sci..

[36]  Jürgen Bock,et al.  Discrete particle swarm optimisation for ontology alignment , 2012, Inf. Sci..

[37]  Christian Becker,et al.  DBpedia Mobile: A Location-Enabled Linked Data Browser , 2008, LDOW.

[38]  Catriel Beeri,et al.  Object Fusion in Geographic Information Systems , 2004, VLDB.

[39]  Stefan Wiemann,et al.  Spatial data fusion in Spatial Data Infrastructures using Linked Data , 2016, Int. J. Geogr. Inf. Sci..

[40]  Christopher B. Jones,et al.  Geographical information retrieval , 2008, Int. J. Geogr. Inf. Sci..

[41]  Bruno Martins A Supervised Machine Learning Approach for Duplicate Detection over Gazetteer Records , 2011, GeoS.

[42]  Tian Zhao,et al.  The framework of a geospatial semantic web-based spatial decision support system for Digital Earth , 2010, Int. J. Digit. Earth.

[43]  Di Chen,et al.  Integrating Spatial Data Linkage and Analysis Services in a Geoportal for China Urban Research , 2015, Trans. GIS.

[44]  Paolo Rosso,et al.  Inferring Geographical Ontologies from Multiple Resources for Geographical Information Retrieval , 2006, GIR.

[45]  K. Janowicz,et al.  A weighted multi-attribute method for matching user-generated Points of Interest , 2014 .

[46]  J. T. Hastings,et al.  Automated conflation of digital gazetteer data , 2008, Int. J. Geogr. Inf. Sci..

[47]  Lorena Otero-Cerdeira,et al.  Ontology matching: A literature review , 2015, Expert Syst. Appl..

[48]  Yue Zhao,et al.  A unified approach to matching semantic data on the Web , 2013, Knowl. Based Syst..

[49]  Christoph Stasch,et al.  A RESTful proxy and data model for linked sensor data , 2013, Int. J. Digit. Earth.

[50]  Volker Walter,et al.  Matching spatial data sets: a statistical approach , 1999, Int. J. Geogr. Inf. Sci..

[51]  Michela Bertolotto,et al.  Geographic knowledge extraction and semantic similarity in OpenStreetMap , 2013, Knowledge and Information Systems.

[52]  F. Khan,et al.  Similarity Measures and their Aggregation in Ontology , 2012 .

[53]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[54]  Alan Saalfeld,et al.  Conflation Automated map compilation , 1988, Int. J. Geogr. Inf. Sci..

[55]  Yehoshua Sagiv,et al.  Ad hoc matching of vectorial road networks , 2013, Int. J. Geogr. Inf. Sci..