A comprehensive methodology for discovering semantic relationships among geospatial vocabularies using oceanographic data discovery as an example

ABSTRACT It is challenging to find relevant data for research and development purposes in the geospatial big data era. One long-standing problem in data discovery is locating, assimilating and utilizing the semantic context for a given query. Most research in the geospatial domain has approached this problem in one of two ways: building a domain-specific ontology manually or discovering automatically, semantic relationships using metadata and machine learning techniques. The former relies on rich expert knowledge but is static, costly and labor intensive, whereas the second is automatic and prone to noise. An emerging trend in information science takes advantage of large-scale user search histories, which are dynamic but subject to user- and crawler-generated noise. Leveraging the benefits of these three approaches and avoiding their weaknesses, a novel methodology is proposed to (1) discover vocabulary-based semantic relationships from user search histories and clickstreams, (2) refine the similarity calculation methods from existing ontologies and (3) integrate the results of ontology, metadata, user search history and clickstream analysis to better determine their semantic relationships. An accuracy assessment by domain experts for the similarity values indicates an 83% overall accuracy for the top 10 related terms over randomly selected sample queries. This research functions as an example for building vocabulary-based semantic relationships for different geographical domains to improve various aspects of data discovery, including the accuracy of the vocabulary relationships of commonly used search terms.

[1]  Krzysztof Janowicz,et al.  The GeoLink Modular Oceanography Ontology , 2015, SEMWEB.

[2]  Martti Juhola,et al.  Stemming and lemmatization in the clustering of finnish text documents , 2004, CIKM '04.

[3]  Luis Bermudez,et al.  A Marine Platforms Ontology: Experiences and Lessons , 2006 .

[4]  Michael F. Goodchild,et al.  Towards geospatial semantic search: exploiting latent semantic relations in geospatial data , 2014, Int. J. Digit. Earth.

[5]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[6]  Kai Liu,et al.  Using Semantic Search and Knowledge Reasoning to Improve the Discovery of Earth Science Records: An Example with the ESIP Semantic Testbed , 2014, Int. J. Appl. Geospat. Res..

[7]  Ranjeet Devarakonda,et al.  Data sharing and retrieval using OAI-PMH , 2011, Earth Sci. Informatics.

[8]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[9]  Max J. Egenhofer,et al.  Comparing geospatial entity classes: an asymmetric and context-dependent similarity measure , 2004, Int. J. Geogr. Inf. Sci..

[10]  Krzysztof Janowicz,et al.  The semantics of similarity in geographic information retrieval , 2011, J. Spatial Inf. Sci..

[11]  Min Sun,et al.  A Generic Framework for Using Multi-Dimensional Earth Observation Data in GIS , 2016, Remote. Sens..

[12]  Chaowei Yang,et al.  Utilizing Cloud Computing to address big geospatial data challenges , 2017, Comput. Environ. Urban Syst..

[13]  Jizhe Xia,et al.  Polar CI Portal: A Cloud-Based Polar Resource Discovery Engine , 2016, CloudCom 2016.

[14]  Jae-Gil Lee,et al.  Geospatial Big Data: Challenges and Opportunities , 2015, Big Data Res..

[15]  Robert G. Raskin,et al.  Knowledge representation in the semantic web for Earth and environmental terminology (SWEET) , 2005, Comput. Geosci..

[16]  Camilo Ortiz,et al.  Query sense disambiguation leveraging large scale user behavioral data , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[17]  Jaime Delgado,et al.  A Vector Space Model for Semantic Similarity Calculation and OWL Ontology Alignment , 2006, DEXA.

[18]  Ranga Raju Vatsavai,et al.  Spatiotemporal data mining in the era of big spatial data: algorithms and applications , 2012, BigSpatial '12.

[19]  Max J. Egenhofer,et al.  Toward the semantic geospatial web , 2002, GIS '02.

[20]  Jing Wang,et al.  Clickage: towards bridging semantic and intent gaps via mining click logs of search engines , 2013, ACM Multimedia.

[21]  Dave Kolas,et al.  Enabling the geospatial Semantic Web with Parliament and GeoSPARQL , 2012, Semantic Web.

[22]  Christoph Mangold,et al.  A survey and classification of semantic search approaches , 2007, Int. J. Metadata Semant. Ontologies.

[23]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[24]  Peter Wiemer-Hastings,et al.  Latent semantic analysis , 2004, Annu. Rev. Inf. Sci. Technol..

[25]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[26]  Thomas S. Huang,et al.  Reconstructing Sessions from Data Discovery and Access Logs to Build a Semantic Knowledge Base for Improving Data Discovery , 2016, ISPRS Int. J. Geo Inf..

[27]  M. E. J. Newman,et al.  Power laws, Pareto distributions and Zipf's law , 2005 .

[28]  Joel Nothman,et al.  Learning multilingual named entity recognition from Wikipedia , 2013, Artif. Intell..

[29]  Angela Schwering,et al.  Semantic Similarity Measurement and Geospatial Applications , 2008, Trans. GIS.

[30]  Bradley N. Miller,et al.  GroupLens: applying collaborative filtering to Usenet news , 1997, CACM.

[31]  Zhenlong Li,et al.  Big Data and cloud computing: innovation opportunities and challenges , 2017, Int. J. Digit. Earth.

[32]  Krzysztof Janowicz,et al.  Metadata Topic Harmonization and Semantic Search for Linked‐Data‐Driven Geoportals: A Case Study Using ArcGIS Online , 2015, Trans. GIS.

[33]  Khalifeh AlJadda,et al.  Crowdsourced query augmentation through semantic discovery of domain-specific jargon , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[34]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..