Coupled intrinsic and extrinsic human language resource-based query expansion

AbstractPoor information retrieval performance has often been attributed to the query-document vocabulary mismatch problem which is defined as the difficulty for human users to formulate precise natural language queries that are in line with the vocabulary of the documents deemed relevant to a specific search goal. To alleviate this problem, query expansion processes are applied in order to spawn and integrate additional terms to an initial query. This requires accurate identification of main query concepts to ensure the intended search goal is duly emphasized and relevant expansion concepts are extracted and included in the enriched query. Natural language queries have intrinsic linguistic properties such as parts-of-speech labels and grammatical relations which can be utilized in determining the intended search goal. Additionally, extrinsic language-based resources such as ontologies are needed to suggest expansion concepts semantically coherent with the query content. We present here a query expansion framework which capitalizes on both linguistic characteristics of user queries and ontology resources for query constituent encoding, expansion concept extraction and concept weighting. A thorough empirical evaluation on real-world datasets validates our approach against unigram language model, relevance model and a sequential dependence-based technique.

[1]  Clement T. Yu,et al.  Word sense disambiguation in queries , 2005, CIKM '05.

[2]  Mohammed Belkhatir,et al.  A linguistically driven framework for query expansion via grammatical constituent highlighting and role-based concept weighting , 2016, Inf. Process. Manag..

[3]  Mohammed Belkhatir,et al.  A three-level architecture for bridging the image semantic gap , 2011, Multimedia Systems.

[4]  Elena García Barriocanal,et al.  An empirical analysis of ontology-based query expansion for learning resource searches using MERLOT and the Gene ontology , 2011, Knowl. Based Syst..

[5]  Jagdev Bhogal,et al.  Ontology Based Query Expansion with a Probabilistic Retrieval Model , 2013 .

[6]  Alexander Kotov,et al.  Embedding-based Query Expansion for Weighted Sequential Dependence Retrieval Model , 2017, SIGIR.

[7]  Yong Yu,et al.  Viewing Term Proximity from a Different Perspective , 2008, ECIR.

[8]  Bob J. Wielinga,et al.  Patterns of semantic relations to improve image content search , 2007, J. Web Semant..

[9]  Mohammed Maree,et al.  Coupling semantic and statistical techniques for dynamically enriching web ontologies , 2013, Journal of Intelligent Information Systems.

[10]  P. Smith,et al.  A review of ontology based query expansion , 2007, Inf. Process. Manag..

[11]  Christopher D. Manning,et al.  Stanford typed dependencies manual , 2010 .

[12]  Idan Szpektor,et al.  Syntactic Parsing of Web Queries with Question Intent , 2016, HLT-NAACL.

[13]  Mohammed Maree,et al.  Multiple Ontology-Based Indexing of Multimedia Documents on the World Wide Web , 2016 .

[14]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[15]  Ana Gabriela Maguitman,et al.  Genetic algorithm for information retrieval , 2009, 2009 International Conference on Intelligent Agent & Multi-Agent Systems.

[16]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[17]  R. H. Goudar,et al.  Ontology Based Automatic Query Expansion for Semantic Information Retrieval in Sports Domain , 2012 .

[18]  Mohammed Maree,et al.  Addressing semantic heterogeneity through multiple knowledge base assisted merging of domain-specific ontologies , 2015, Knowl. Based Syst..

[19]  Donald H. Kraft,et al.  Applying Genetic Algorithms to Information Retrieval Systems Via Relevance Feedback , 1995 .

[20]  Ziqi Zhang,et al.  Harnessing different knowledge sources to measure semantic relatedness under a uniform model , 2011, EMNLP.

[21]  Douglas W. Oard,et al.  A Fixed-Point Method for Weighting Terms in Verbose Informational Queries , 2014, CIKM.

[22]  Jane Greenberg Optimal query expansion (QE) processing methods with semantically encoded structured thesauri terminology , 2001 .

[23]  Clement T. Yu,et al.  An effective approach to document retrieval via utilizing WordNet and recognizing phrases , 2004, SIGIR '04.

[24]  W. Bruce Croft,et al.  A Comparison of Retrieval Models using Term Dependencies , 2014, CIKM.

[25]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[26]  Harith Alani,et al.  Augmenting Thesaurus Relationships: Possibilities for Retrieval , 2001, J. Digit. Inf..

[27]  An Ontology-Based Approach to Information Retrieval , 2016, International KEYSTONE Conference.

[28]  Rada Mihalcea,et al.  Using Wikipedia for Automatic Word Sense Disambiguation , 2007, NAACL.

[29]  Mohammed Maree,et al.  A Coupled Statistical/Semantic Framework for Merging Heterogeneous Domain-Specific Ontologies , 2010, 2010 22nd IEEE International Conference on Tools with Artificial Intelligence.

[30]  Montse Cuadros,et al.  Quality Assessment of Large Scale Knowledge Resources , 2006, EMNLP.

[31]  Christopher H. Messom,et al.  A coupled linguistics/statistical technique for query structure classification and its application to Query Expansion , 2013, 2013 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[32]  Yi-fang Brook Wu,et al.  Domain-specific keyphrase extraction , 2005, CIKM '05.

[33]  Jane Greenberg,et al.  Optimal query expansion (QE) processing methods with semantically encoded structured thesauri terminology , 2001, J. Assoc. Inf. Sci. Technol..

[34]  Angela Fogarolli,et al.  Wikipedia as a Source of Ontological Knowledge: State of the Art and Application , 2011, Intelligent Networking, Collaborative Systems and Applications.

[35]  Mandar Mitra,et al.  Improving query expansion using WordNet , 2013, J. Assoc. Inf. Sci. Technol..

[36]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[37]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[38]  Diana McCarthy,et al.  Disambiguating Nouns, Verbs, and Adjectives Using Automatically Acquired Selectional Preferences , 2003, CL.

[39]  W. Bruce Croft,et al.  A quasi-synchronous dependence model for information retrieval , 2011, CIKM '11.

[40]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[41]  Fariza Fauzi,et al.  Multifaceted conceptual image indexing on the world wide web , 2013, Inf. Process. Manag..

[42]  Roberto Navigli,et al.  An analysis of ontology-based query expansion strategies , 2003 .

[43]  Jian-Yun Nie,et al.  Integrating word relationships into language models , 2005, SIGIR '05.

[44]  Iadh Ounis,et al.  A syntactically-based query reformulation technique for information retrieval , 2008, Inf. Process. Manag..

[45]  W. Bruce Croft,et al.  Discovering key concepts in verbose queries , 2008, SIGIR '08.

[46]  Mohammed Belkhatir,et al.  Natural language technology and query expansion: issues, state-of-the-art and perspectives , 2011, Journal of Intelligent Information Systems.

[47]  Hae-Chang Rim,et al.  Information retrieval using word senses: root sense tagging approach , 2004, SIGIR '04.

[48]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[49]  Fariza Fauzi,et al.  Image understanding and the web: a state-of-the-art review , 2014, Journal of Intelligent Information Systems.

[50]  Guido Zuccon,et al.  Information retrieval as semantic inference: a Graph Inference model applied to medical search , 2016, Information Retrieval Journal.

[51]  Eero Hyvönen,et al.  Ontology-Based Query Expansion Widget for Information Retrieval , 2009, SFSW@ESWC.

[52]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[53]  Kentaro Torisawa,et al.  Why Wikipedia Needs to Make Friends with WordNet , 2009 .

[54]  Robert R. Korfhage,et al.  Query Improvement in Information Retrieval Using Genetic Algorithms - A Report on the Experiments of the TREC Project , 1992, TREC.

[55]  Ted Pedersen,et al.  UMND1: Unsupervised Word Sense Disambiguation Using Contextual Semantic Relatedness , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[56]  Michael A. Covington,et al.  A Fundamental Algorithm for Dependency Parsing , 2004 .

[57]  Joo-Hwee Lim,et al.  Combining Textual and Visual Ontologies to Solve Medical Multimodal Queries , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[58]  Ted Pedersen,et al.  WordNet::SenseRelate::AllWords - A Broad Coverage Word Sense Tagger that Maximizes Semantic Relatedness , 2009, NAACL.