Geographical query reformulation using a geographical adjacency taxonomy builder and word senses

Purpose Geographical query formulation is one of the key difficulties for users in search engines. The purpose of this study is to improve geographical search by proposing a novel geographical query reformulation (GQR) technique using a geographical taxonomy and word senses. Design/methodology/approach This work introduces an approach for GQR, which combines a method of query components separation that uses GeoNames, a technique for reformulating these components using WordNet and a geographic taxonomy constructed using the latent semantic analysis method. Findings The proposed approach was compared to two methods from the literature, using the mean average precision (MAP) and the precision at 20 documents (P@20). The experimental results show that it outperforms the other techniques by 15.73% to 31.21% in terms of P@20 and by 17.81% to 35.52% in terms of MAP. Research limitations/implications According to the experimental results, the best created taxonomy using the geographical adjacency taxonomy builder contains 7.67% of incorrect links. This paper believes that using a very big amount of data for taxonomy building can give better results. Thus, in future work, this paper intends to apply the approach in a big data context. Originality/value Despite this, the reformulation of geographical queries using the new proposed approach considerably improves the precision of queries and retrieves relevant documents that were not retrieved using the original queries. The strengths of the technique lie in the facts of reformulating both thematic and spatial entities and replacing the spatial entity of the query with terms that explain the intent of the query more precisely using a geographical taxonomy.

[1]  Torsten Suel,et al.  Efficient query processing in geographic web search engines , 2006, SIGMOD Conference.

[2]  Johannes Leveling Exploring term selection for geographic blind feedback , 2007, GIR '07.

[3]  Paolo Rosso,et al.  Using GeoWordNet for Geographical Information Retrieval , 2008, CLEF.

[4]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[5]  Christopher B. Jones,et al.  Geographical information retrieval , 2008, Int. J. Geogr. Inf. Sci..

[6]  Luis Gravano,et al.  Categorizing web queries according to geographical locality , 2003, CIKM '03.

[7]  Prabhakar Raghavan,et al.  Matrix decompositions and latentsemantic indexing , 2022 .

[8]  Paolo Rosso,et al.  A WordNet-Based Indexing Technique for Geographical Information Retrieval , 2006, CLEF.

[9]  Krzysztof Janowicz,et al.  Metadata Topic Harmonization and Semantic Search for Linked‐Data‐Driven Geoportals: A Case Study Using ArcGIS Online , 2015, Trans. GIS.

[10]  Dan Klein,et al.  A Joint Model for Entity Analysis: Coreference, Typing, and Linking , 2014, TACL.

[11]  Ahmad Noraziah,et al.  A survey of statistical approaches for query expansion , 2018, Knowledge and Information Systems.

[12]  Julien Lesbegueries,et al.  Une approche d'extraction et de recherche d'information spatiale dans les documents textuels - évaluation , 2007, CORIA.

[13]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[14]  Wei-Ying Ma,et al.  Recommending friends and locations based on individual location history , 2011, ACM Trans. Web.

[15]  Mark Sanderson,et al.  Analyzing geographic query reformulation: An exploratory study , 2014, J. Assoc. Inf. Sci. Technol..

[16]  Mário J. Silva,et al.  Query expansion through geographical feature types , 2007, GIR '07.

[17]  Fei Wang,et al.  A two-stage feature selection method for text categorization by using category correlation degree and latent semantic indexing , 2015 .

[18]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[19]  Haixun Wang,et al.  Automatic Taxonomy Construction from Keywords via Scalable Bayesian Rose Trees , 2015, IEEE Transactions on Knowledge and Data Engineering.

[20]  Khaled Mellouli,et al.  Une extension de mesure de similarité entre les concepts d'une ontologie , 2007 .

[21]  Yu Zhang,et al.  A hybrid method for Chinese address segmentation , 2018, Int. J. Geogr. Inf. Sci..

[22]  Solange Oliveira Rezende,et al.  Discovering the spatial coverage of the documents through the SpatialCIM Methodology , 2012 .

[23]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[24]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[25]  Lu Wang,et al.  Clustering query refinements by user intent , 2010, WWW '10.

[26]  Gerhard Weikum,et al.  AIDA-light: High-Throughput Named-Entity Disambiguation , 2014, LDOW.

[27]  Alan M. MacEachren,et al.  GeoTxt: A scalable geoparsing system for unstructured text geolocation , 2019, Trans. GIS.

[28]  Travis Atkison,et al.  Preliminary research on thesaurus-based query expansion for Twitter data extraction , 2018, ACM Southeast Regional Conference.

[29]  Ákos Halmai,et al.  Applicability of a Recreational-Grade Interferometric Sonar for the Bathymetric Survey and Monitoring of the Drava River , 2020, ISPRS Int. J. Geo Inf..

[30]  Shuliang Zhang,et al.  Geographic Information Retrieval Method for Geography Mark-Up Language Data , 2018, ISPRS Int. J. Geo Inf..

[31]  Wei Vivian Zhang,et al.  Geographic intention and modification in web search , 2008, Int. J. Geogr. Inf. Sci..

[32]  Driss Aboutajdine,et al.  A new approach to build a geographical taxonomy of adjacency automatically using the latent semantic indexing method , 2015, 2015 Intelligent Systems and Computer Vision (ISCV).

[33]  Thomas Sandholm,et al.  Real-time, location-aware collaborative filtering of web content , 2011, CaRR '11.

[34]  Qingyun Du,et al.  A deep learning architecture for semantic address matching , 2019, Int. J. Geogr. Inf. Sci..

[35]  Carolyn R. Watters,et al.  Extending the Rocchio Relevance Feedback Algorithm to Provide Contextual Retrieval , 2004, AWIC.

[36]  John Riedl,et al.  Introduction to special issue on recommender systems , 2011, ACM Trans. Web.

[37]  Miguel Ángel García Cumbreras,et al.  Applying NLP Techniques for Query Reformulation to Information Retrieval with Geographical References , 2012, PAKDD Workshops.

[38]  Gerard Deepak,et al.  Personalized and Enhanced Hybridized Semantic Algorithm for web image retrieval incorporating ontology classification, strategic query expansion, and content-based analysis , 2018, Comput. Electr. Eng..

[39]  Torsten Suel,et al.  Analysis of geographic queries in a search engine log , 2008, LocWeb.

[40]  Margarita Kokla,et al.  A Review of Geospatial Semantic Information Modeling and Elicitation Approaches , 2020, ISPRS Int. J. Geo Inf..

[41]  Alia I. Abdelmoty,et al.  Ontology-Based Spatial Query Expansion in Information Retrieval , 2005, OTM Conferences.

[42]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[43]  Ángeles Saavedra Places,et al.  Retrieving Documents with Geographic References Using a Spatial Index Structure Based on Ontologies , 2008, ER Workshops.

[44]  Ying Li,et al.  Detecting dominant locations from search queries , 2005, SIGIR '05.

[45]  Joachim Kohler Analyzing search engine queries for the use of geographic terms , 2003 .

[46]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[47]  Moulay Driss Rahmani,et al.  Geographical Query reformulation using a Geographical Taxonomy and WordNet , 2018 .

[48]  Yi Li,et al.  An empirical study of the effects of NLP components on Geographic IR performance , 2008, Int. J. Geogr. Inf. Sci..

[49]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[50]  Gerhard Weikum,et al.  J-NERD: Joint Named Entity Recognition and Disambiguation with Rich Linguistic Features , 2016, TACL.