Geographical queries reformulation using a parallel association rules generator to build spatial taxonomies

Geographical queries need a special process of reformulation by information retrieval systems (IRS) due to their specificities and hierarchical structure. This fact is ignored by most of web search engines. In this paper, we propose an automatic approach for building a spatial taxonomy, that models’ the notion of adjacency that will be used in the reformulation of the spatial part of a geographical query. This approach exploits the documents that are in top of the retrieved list when submitting a spatial entity, which is composed of a spatial relation and a noun of a city. Then, a transactional database is constructed, considering each document extracted as a transaction that contains the nouns of the cities sharing the country of the submitted query’s city. The algorithm frequent pattern growth (FP-growth) is applied to this database in his parallel version (parallel FP-growth: PFP) in order to generate association rules, that will form the country’s taxonomy in a Big Data context. Experiments has been conducted on Spark and their results show that query reformulation using the taxonomy constructed based on our proposed approach improves the precision and the effectiveness of the IRS.

[1]  Houda Bouamor,et al.  Extraction des connaissances à partir du Web pour la recherche des images géoréférencées , 2009, CORIA.

[2]  Haixun Wang,et al.  Automatic Taxonomy Construction from Keywords via Scalable Bayesian Rose Trees , 2015, IEEE Transactions on Knowledge and Data Engineering.

[4]  Margarita Kokla,et al.  A Review of Geospatial Semantic Information Modeling and Elicitation Approaches , 2020, ISPRS Int. J. Geo Inf..

[5]  M. Sanderson,et al.  Analyzing geographic queries , 2004 .

[6]  Davide Buscaldi Toponym ambiguity in geographical information retrieval , 2009, SIGIR.

[7]  Haixun Wang,et al.  Automatic taxonomy construction from keywords , 2012, KDD.

[8]  Solange Oliveira Rezende,et al.  Discovering the spatial coverage of the documents through the SpatialCIM Methodology , 2012 .

[9]  Lingling Deng,et al.  Improvement and Research of FP-Growth Algorithm Based on Distributed Spark , 2015, 2015 International Conference on Cloud Computing and Big Data (CCBD).

[10]  Lu Wang,et al.  Clustering query refinements by user intent , 2010, WWW '10.

[11]  Rocío Abascal-Mena,et al.  Geo information extraction and processing from travel narratives , 2010, ELPUB.

[12]  Ahmad Noraziah,et al.  A survey of statistical approaches for query expansion , 2018, Knowledge and Information Systems.

[13]  Julien Lesbegueries,et al.  Une approche d'extraction et de recherche d'information spatiale dans les documents textuels - évaluation , 2007, CORIA.

[14]  Shuliang Zhang,et al.  Geographic Information Retrieval Method for Geography Mark-Up Language Data , 2018, ISPRS Int. J. Geo Inf..

[15]  Christian Sallaberry,et al.  Typage de noms toponymiques à des fins d'indexation géographique , 2012, Trait. Autom. des Langues.

[16]  Travis Atkison,et al.  Preliminary research on thesaurus-based query expansion for Twitter data extraction , 2018, ACM Southeast Regional Conference.

[17]  Mohammed Al-Maolegi,et al.  An Improved Apriori Algorithm for Association Rules , 2014, ArXiv.

[18]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[19]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[20]  Moulay Driss Rahmani,et al.  Geographical Query reformulation using a Geographical Taxonomy and WordNet , 2018 .

[21]  Paolo Rosso,et al.  Using GeoWordNet for Geographical Information Retrieval , 2008, CLEF.

[22]  Pierre Loustau Interprétation automatique d'itinéraires dans des récits de voyages. : D'une information géographique du syntagme à une information géographique du discours. , 2008 .

[23]  Mark Sanderson,et al.  Analyzing geographic query reformulation: An exploratory study , 2014, J. Assoc. Inf. Sci. Technol..

[24]  Miguel Ángel García Cumbreras,et al.  Applying NLP Techniques for Query Reformulation to Information Retrieval with Geographical References , 2012, PAKDD Workshops.