Query Expansion with Locally-Trained Word Embeddings

Continuous space word embeddings have received a great deal of attention in the natural language processing and machine learning communities for their ability to model term similarity and other relationships. We study the use of term relatedness in the context of query expansion for ad hoc information retrieval. We demonstrate that word embeddings such as word2vec and GloVe, when trained globally, underperform corpus and query specific embeddings for retrieval tasks. These results suggest that other tasks benefiting from global embeddings may also benefit from local embeddings.

[1]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[3]  Fernando Diaz,et al.  UMass at TREC 2004: Novelty and HARD , 2004, TREC.

[4]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[5]  Chris Buckley,et al.  Learning routing queries in a query zone , 1997, SIGIR '97.

[6]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[7]  W. Bruce Croft,et al.  Quary Expansion Using Local and Global Document Analysis , 1996, SIGIR Forum.

[8]  Fernando Diaz,et al.  Condensed List Relevance Models , 2015, ICTIR.

[9]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[10]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[11]  Fernando Diaz,et al.  Improving the estimation of relevance models using large external corpora , 2006, SIGIR.

[12]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[13]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[14]  Tatsuya Kawahara,et al.  Language model and speaking rate adaptation for spontaneous presentation speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[15]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[16]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[17]  Jerome R. Bellegarda,et al.  Statistical language model adaptation: review and perspectives , 2004, Speech Commun..

[18]  Charles Elkan,et al.  Latent semantic indexing (LSI) fails for TREC collections , 2011, SKDD.

[19]  Peter Willett Query-specific automatic document classification , 1985 .

[20]  David A. Hull Improving text retrieval for the routing problem using latent semantic indexing , 1994, SIGIR '94.

[21]  Jean-Pierre Chevallet,et al.  A Comparison of Deep Learning Based Query Expansion with Pseudo-Relevance Feedback and Mutual Information , 2016, ECIR.

[22]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[23]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[24]  Mandar Mitra,et al.  Word Embedding based Generalized Language Model for Information Retrieval , 2015, SIGIR.

[25]  Zhiyuan Liu,et al.  Topical Word Embeddings , 2015, AAAI.

[26]  Raymond J. Mooney,et al.  Multi-Prototype Vector-Space Models of Word Meaning , 2010, NAACL.

[27]  Marcello Federico,et al.  Bayesian estimation methods for n-gram language model adaptation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[28]  Yoshua Bengio,et al.  Learning Concept Embeddings for Query Expansion by Quantum Entropy Minimization , 2014, AAAI.

[29]  C. J. van Rijsbergen,et al.  Query-sensitive similarity measures for the calculation of interdocument relationships , 2001, CIKM '01.

[30]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[31]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[32]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[33]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[34]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[35]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[36]  Bhaskar Mitra,et al.  Improving Document Ranking with Dual Word Embeddings , 2016, WWW.

[37]  David Yarowsky,et al.  One Sense per Collocation , 1993, HLT.

[38]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[39]  Raymond J. Mooney,et al.  A Mixture Model with Sharing for Lexical Semantics , 2010, EMNLP.

[40]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[41]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[42]  Robert Villa,et al.  The effectiveness of query-specific hierarchic clustering in information retrieval , 2002, Inf. Process. Manag..

[43]  Stephan Vogel,et al.  Language Model Adaptation for Statistical Machine Translation via Structured Query Models , 2004, COLING.

[44]  John Liu,et al.  sense2vec - A Fast and Accurate Method for Word Sense Disambiguation In Neural Word Embeddings , 2015, ArXiv.

[45]  Mari Ostendorf,et al.  Modeling long distance dependence in language: topic mixtures versus dynamic cache models , 1996, IEEE Trans. Speech Audio Process..

[46]  Aviezri S. Fraenkel,et al.  Local Feedback in Full-Text Retrieval Systems , 1977, JACM.

[47]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[48]  Susan T. Dumais,et al.  Latent Semantic Indexing (LSI): TREC-3 Report , 1994, TREC.

[49]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..