Towards intelligent geospatial data discovery: a machine learning framework for search ranking

ABSTRACT Current search engines in most geospatial data portals tend to induce users to focus on one single-data characteristic dimension (e.g. popularity and release date). This approach largely fails to take account of users’ multidimensional preferences for geospatial data, and hence may likely result in a less than optimal user experience in discovering the most applicable dataset. This study reports a machine learning framework to address the ranking challenge, the fundamental obstacle in geospatial data discovery, by (1) identifying a number of ranking features of geospatial data to represent users’ multidimensional preferences by considering semantics, user behavior, spatial similarity, and static dataset metadata attributes; (2) applying a machine learning method to automatically learn a ranking function; and (3) proposing a system architecture to combine existing search-oriented open source software, semantic knowledge base, ranking feature extraction, and machine learning algorithm. Results show that the machine learning approach outperforms other methods, in terms of both precision at K and normalized discounted cumulative gain. As an early attempt of utilizing machine learning to improve the search ranking in the geospatial domain, we expect this work to set an example for further research and open the door towards intelligent geospatial data discovery.

[1]  Robert G. Raskin,et al.  Knowledge representation in the semantic web for Earth and environmental terminology (SWEET) , 2005, Comput. Geosci..

[2]  Khalifeh AlJadda,et al.  Crowdsourced query augmentation through semantic discovery of domain-specific jargon , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[3]  Zhenlong Li,et al.  Big Data and cloud computing: innovation opportunities and challenges , 2017, Int. J. Digit. Earth.

[4]  Pável Calado,et al.  Learning to rank for geographic information retrieval , 2010, GIR.

[5]  Sergey Brin,et al.  Reprint of: The anatomy of a large-scale hypertextual web search engine , 2012, Comput. Networks.

[6]  Jizhe Xia,et al.  Polar CI Portal: A Cloud-Based Polar Resource Discovery Engine , 2016, CloudCom 2016.

[7]  Lois M. L. Delcambre,et al.  Discounted Cumulated Gain Based Evaluation of Multiple-Query IR Sessions , 2008, ECIR.

[8]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[9]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[10]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[11]  Beibei Li,et al.  Designing Ranking Systems for Hotels on Travel Search Engines by Mining User-Generated and Crowd-Sourced Content , 2011, Mark. Sci..

[12]  Andrew Hogue,et al.  Learning to rank for spatiotemporal search , 2013, WSDM.

[13]  Thomas S. Huang,et al.  A comprehensive methodology for discovering semantic relationships among geospatial vocabularies using oceanographic data discovery as an example , 2017, Int. J. Geogr. Inf. Sci..

[14]  Stefano Nativi,et al.  Big Data challenges in building the Global Earth Observation System of Systems , 2015, Environ. Model. Softw..

[15]  Krzysztof Janowicz,et al.  Metadata Topic Harmonization and Semantic Search for Linked‐Data‐Driven Geoportals: A Case Study Using ArcGIS Online , 2015, Trans. GIS.

[16]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[17]  Chaowei Yang,et al.  Utilizing Cloud Computing to address big geospatial data challenges , 2017, Comput. Environ. Urban Syst..

[18]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[19]  Ryen W. White,et al.  Probabilistic models for personalizing web search , 2012, WSDM '12.

[20]  ChengXiang Zhai,et al.  Learn from web search logs to organize search results , 2007, SIGIR.

[21]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[22]  Michael F. Goodchild,et al.  Towards geospatial semantic search: exploiting latent semantic relations in geospatial data , 2014, Int. J. Digit. Earth.

[23]  Filip Radlinski,et al.  Personalizing web search using long term browsing history , 2011, WSDM '11.

[24]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[25]  Ranjeet Devarakonda,et al.  Data sharing and retrieval using OAI-PMH , 2011, Earth Sci. Informatics.

[26]  Thomas S. Huang,et al.  Reconstructing Sessions from Data Discovery and Access Logs to Build a Semantic Knowledge Base for Improving Data Discovery , 2016, ISPRS Int. J. Geo Inf..

[27]  Gabriel Svennerberg,et al.  Beginning Google Maps API 3 , 2010 .