Relevance-based Word Embedding

Learning a high-dimensional dense representation for vocabulary terms, also known as a word embedding, has recently attracted much attention in natural language processing and information retrieval tasks. The embedding vectors are typically learned based on term proximity in a large corpus. This means that the objective in well-known word embedding algorithms, e.g., word2vec, is to accurately predict adjacent word(s) for a given word or context. However, this objective is not necessarily equivalent to the goal of many information retrieval (IR) tasks. The primary objective in various IR tasks is to capture relevance instead of term proximity, syntactic, or even semantic similarity. This is the motivation for developing unsupervised relevance-based word embedding models that learn word representations based on query-document relevance information. In this paper, we propose two learning models with different objective functions; one learns a relevance distribution over the vocabulary set for each query, and the other classifies each term as belonging to the relevant or non-relevant class for each query. To train our models, we used over six million unique queries and the top ranked documents retrieved in response to each query, which are assumed to be relevant to the query. We extrinsically evaluate our learned word representation models using two IR tasks: query expansion and query classification. Both query expansion experiments on four TREC collections and query classification experiments on the KDD Cup 2005 dataset suggest that the relevance-based word embedding models significantly outperform state-of-the-art proximity-based embedding models, such as word2vec and GloVe.

[1]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[2]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[3]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[4]  W. Bruce Croft,et al.  An Association Thesaurus for Information Retrieval , 1994, RIAO.

[5]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[6]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[7]  Document language models, query models, and risk minimization for information retrieval , 2001, SIGIR '01.

[8]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[9]  Peter Bruza,et al.  Inferring query models by computing information flow , 2002, CIKM '02.

[10]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[11]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[12]  W. Bruce Croft,et al.  Cross-lingual relevance models , 2002, SIGIR '02.

[13]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[14]  Fernando Diaz,et al.  UMass at TREC 2004: Novelty and HARD , 2004, TREC.

[15]  Ying Li,et al.  KDD CUP-2005 report: facing a great challenge , 2005, SKDD.

[16]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[17]  Tao Tao,et al.  Regularized estimation of mixture models for robust pseudo-relevance feedback , 2006, SIGIR.

[18]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[19]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[20]  Charles L. A. Clarke,et al.  Efficient and effective spam filtering and re-ranking for large web datasets , 2010, Information Retrieval.

[21]  Mark Levene,et al.  Search Engines: Information Retrieval in Practice , 2011, Comput. J..

[22]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[23]  Florent Perronnin,et al.  Aggregating Continuous Word Embeddings for Information Retrieval , 2013, CVSM@ACL.

[24]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[25]  Yoshua Bengio,et al.  Learning Concept Embeddings for Query Expansion by Quantum Entropy Minimization , 2014, AAAI.

[26]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[27]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[28]  M. de Rijke,et al.  Short Text Similarity with Word Embeddings , 2015, CIKM.

[29]  James P. Callan,et al.  Learning to Reweight Terms with Distributed Representations , 2015, SIGIR.

[30]  Marie-Francine Moens,et al.  Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings , 2015, SIGIR.

[31]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[32]  Fernando Diaz,et al.  Condensed List Relevance Models , 2015, ICTIR.

[33]  Xiaodong Liu,et al.  Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval , 2015, NAACL.

[34]  John D. Lafferty,et al.  Beyond independent relevance: methods and evaluation metrics for subtopic retrieval , 2003, SIGIR.

[35]  Po Hu,et al.  Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering , 2015, ACL.

[36]  Jiafeng Guo,et al.  Analysis of the Paragraph Vector Model for Information Retrieval , 2016, ICTIR.

[37]  Azadeh Shakery,et al.  Pseudo-Relevance Feedback Based on Matrix Factorization , 2016, CIKM.

[38]  Allan Hanbury,et al.  Generalizing Translation Models in the Probabilistic Relevance Framework , 2016, CIKM.

[39]  Oren Kurland,et al.  Query Expansion Using Word Embeddings , 2016, CIKM.

[40]  W. Bruce Croft,et al.  Embedding-based Query Language Models , 2016, ICTIR.

[41]  Nick Craswell,et al.  Query Expansion with Locally-Trained Word Embeddings , 2016, ACL.

[42]  Tefko Saracevic,et al.  The Notion of Relevance in Information Science: Everybody knows what relevance is. But, what is it really? , 2016, The Notion of Relevance in Information Science.

[43]  W. Bruce Croft,et al.  Estimating Embedding Vectors for Queries , 2016, ICTIR.

[44]  Hamed Zamani,et al.  Situational Context for Ranking in Personal Search , 2017, WWW.

[45]  W. Bruce Croft,et al.  Quary Expansion Using Local and Global Document Analysis , 1996, SIGIR Forum.

[46]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[47]  Allan Hanbury,et al.  Word Embedding Causes Topic Shifting; Exploit Global Context! , 2017, SIGIR.