A Smart Web-Based Geospatial Data Discovery System with Oceanographic Data as an Example

Discovering and accessing geospatial data presents a significant challenge for the Earth sciences community as massive amounts of data are being produced on a daily basis. In this article, we report a smart web-based geospatial data discovery system that mines and utilizes data relevancy from metadata user behavior. Specifically, (1) the system enables semantic query expansion and suggestion to assist users in finding more relevant data; (2) machine-learned ranking is utilized to provide the optimal search ranking based on a number of identified ranking features that can reflect users’ search preferences; (3) a hybrid recommendation module is designed to allow users to discover related data considering metadata attributes and user behavior; (4) an integrated graphic user interface design is developed to quickly and intuitively guide data consumers to the appropriate data resources. As a proof of concept, we focus on a well-defined domain-oceanography and use oceanographic data discovery as an example. Experiments and a search example show that the proposed system can improve the scientific community’s data search experience by providing query expansion, suggestion, better search ranking, and data recommendation via a user-friendly interface.

[1]  Yongyao Jiang Mining and Utilizing Dataset Relevancy from Oceanographic Dataset (MUDROD) Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access , 2015 .

[2]  Edward M. Armstrong,et al.  Leveraging cloud computing to speedup user access log mining , 2016, OCEANS 2016 MTS/IEEE Monterey.

[3]  Clinton Gormley,et al.  Elasticsearch: The Definitive Guide , 2015 .

[4]  M. Mcphaden,et al.  Genesis and evolution of the 1997-98 El Nino , 1999, Science.

[5]  Rajiv Ranjan,et al.  Streaming Big Data Processing in Datacenter Clouds , 2014, IEEE Cloud Computing.

[6]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[7]  Robert G. Raskin,et al.  Knowledge representation in the semantic web for Earth and environmental terminology (SWEET) , 2005, Comput. Geosci..

[8]  D. Hartmann Global Physical Climatology , 1994 .

[9]  Krzysztof Janowicz,et al.  Metadata Topic Harmonization and Semantic Search for Linked‐Data‐Driven Geoportals: A Case Study Using ArcGIS Online , 2015, Trans. GIS.

[10]  Jizhe Xia,et al.  Polar CI Portal: A Cloud-Based Polar Resource Discovery Engine , 2016, CloudCom 2016.

[11]  Andrew Hogue,et al.  Learning to rank for spatiotemporal search , 2013, WSDM.

[12]  Thomas S. Huang,et al.  A comprehensive methodology for discovering semantic relationships among geospatial vocabularies using oceanographic data discovery as an example , 2017, Int. J. Geogr. Inf. Sci..

[13]  Alexander Ignatov,et al.  Group for High Resolution Sea Surface temperature (GHRSST) analysis fields inter-comparisons. Part 1: A GHRSST multi-product ensemble (GMPE) , 2012 .

[14]  Thomas S. Huang,et al.  Towards intelligent geospatial data discovery: a machine learning framework for search ranking , 2018, Int. J. Digit. Earth.

[15]  Khalifeh AlJadda,et al.  Crowdsourced query augmentation through semantic discovery of domain-specific jargon , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[16]  Weiwei Song,et al.  A High Performance, Spatiotemporal Statistical Analysis System Based on a Spatiotemporal Cloud Platform , 2017, ISPRS Int. J. Geo Inf..

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18]  Isaac Ginis,et al.  The Effect of Wind-Wave-Current Interaction on Air-Sea Momentum Fluxes and Ocean Response in Tropical Cyclones , 2009 .

[19]  Pável Calado,et al.  Learning to rank for geographic information retrieval , 2010, GIR.

[20]  S. Dumais Latent Semantic Analysis. , 2005 .

[21]  J. Overpeck,et al.  Climate Data Challenges in the 21st Century , 2011, Science.

[22]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[23]  Andreas Richter,et al.  From Geoportals to Geographic Knowledge Portals , 2013, ISPRS Int. J. Geo Inf..

[24]  Michael F. Goodchild,et al.  Towards geospatial semantic search: exploiting latent semantic relations in geospatial data , 2014, Int. J. Digit. Earth.

[25]  Krzysztof Janowicz,et al.  The GeoLink Modular Oceanography Ontology , 2015, SEMWEB.

[26]  Ozgun Akcay,et al.  Building a semantic based public transportation geoportal compliant with the INSPIRE transport network data theme , 2013, Earth Science Informatics.

[27]  Ranjeet Devarakonda,et al.  Mercury: reusable metadata management, data discovery and access system , 2010, Earth Sci. Informatics.

[28]  Beibei Li,et al.  Designing Ranking Systems for Hotels on Travel Search Engines by Mining User-Generated and Crowd-Sourced Content , 2011, Mark. Sci..

[29]  Chaowei Yang,et al.  Utilizing Cloud Computing to address big geospatial data challenges , 2017, Comput. Environ. Urban Syst..

[30]  Thomas S. Huang,et al.  Reconstructing Sessions from Data Discovery and Access Logs to Build a Semantic Knowledge Base for Improving Data Discovery , 2016, ISPRS Int. J. Geo Inf..