Neural Vector Spaces for Unsupervised Information Retrieval

We propose the Neural Vector Space Model (NVSM), a method that learns representations of documents in an unsupervised manner for news article retrieval. In the NVSM paradigm, we learn low-dimensional representations of words and documents from scratch using gradient descent and rank documents according to their similarity with query representations that are composed from word representations. We show that NVSM performs better at document ranking than existing latent semantic vector space methods. The addition of NVSM to a mixture of lexical language models and a state-of-the-art baseline vector space model yields a statistically significant increase in retrieval effectiveness. Consequently, NVSM adds a complementary relevance signal. Next to semantic matching, we find that NVSM performs well in cases where lexical matching is needed. NVSM learns a notion of term specificity directly from the document collection without feature engineering. We also show that NVSM learns regularities related to Luhn significance. Finally, we give advice on how to deploy NVSM in situations where model selection (e.g., cross-validation) is infeasible. We find that an unsupervised ensemble of multiple models trained with different hyperparameter values performs better than a single cross-validated model. Therefore, NVSM can safely be used for ranking documents without supervised relevance judgments.

[1]  Donna K. Harman,et al.  Document Detection Data Preparation , 1993, TIPSTER.

[2]  Shimon Ullman,et al.  Object recognition with informative features and linear classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[4]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[5]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[6]  Gerd Heber,et al.  An overview of the HDF5 technology suite and its applications , 2011, AD '11.

[7]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[8]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[9]  Donna Harman,et al.  Overview of the First Text REtrieval Conference. , 1993, SIGIR 1993.

[10]  John D. Lafferty,et al.  A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval , 2017, SIGF.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[13]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[14]  Hang Li,et al.  Semantic Matching in Search , 2014, SMIR@SIGIR.

[15]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[16]  Xiangji Huang,et al.  Exploiting Semantic Coherence Features for Information Retrieval , 2016, SIGIR.

[17]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[19]  David Novak,et al.  Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search , 2016, CIKM.

[20]  M. de Rijke,et al.  Learning Latent Vector Spaces for Product Search , 2016, CIKM.

[21]  W. Bruce Croft,et al.  Estimating Embedding Vectors for Queries , 2016, ICTIR.

[22]  Marie-Francine Moens,et al.  Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings , 2015, SIGIR.

[23]  W. Bruce Croft,et al.  Semantic Matching by Non-Linear Word Transportation for Information Retrieval , 2016, CIKM.

[24]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  W. Bruce Croft,et al.  Embedding-based Query Language Models , 2016, ICTIR.

[26]  Bhaskar Mitra,et al.  Neural Networks for Information Retrieval , 2017, SIGIR.

[27]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  이창기 Long Short-Term Memory 기반의 Recurrent Neural Network를 이용한 개체명 인식 , 2015 .

[30]  Marcel Worring,et al.  Unsupervised, Efficient and Semantic Expertise Retrieval , 2016, WWW.

[31]  Christof Monz,et al.  Recurrent Memory Network for Language Modeling , 2016, ArXiv.

[32]  Minmin Chen,et al.  Efficient Vector Representation for Documents through Corruption , 2017, ICLR.

[33]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[34]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[35]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[36]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[37]  Bhaskar Mitra,et al.  Improving Document Ranking with Dual Word Embeddings , 2016, WWW.

[38]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[39]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[40]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[41]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[42]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[43]  Maarten de Rijke,et al.  Semantic Entity Retrieval Toolkit , 2017, ArXiv.

[44]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[45]  Nick Craswell,et al.  Learning to Match using Local and Distributed Representations of Text for Web Search , 2016, WWW.

[46]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[47]  James Allan,et al.  TREC 2017 Common Core Track Overview , 2017, TREC.

[48]  W. Bruce Croft,et al.  Indri: A language-model based search engine for complex queries1 , 2005 .

[49]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[50]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[51]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[52]  Susan T. Dumais,et al.  Latent Semantic Indexing (LSI): TREC-3 Report , 1994, TREC.

[53]  Donna K. Harman,et al.  Overview of the Fifth Text REtrieval Conference (TREC-5) , 1996, TREC.

[54]  Maarten de Rijke,et al.  A Context-aware Time Model for Web Search , 2016, SIGIR.

[55]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[56]  Guido Zuccon,et al.  Integrating and Evaluating Neural Word Embeddings in Information Retrieval , 2015, ADCS.

[57]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[58]  M. de Rijke,et al.  A Neural Click Model for Web Search , 2016, WWW.

[59]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[60]  W. Bruce Croft,et al.  Improving Language Estimation with the Paragraph Vector Model for Ad-hoc Retrieval , 2016, SIGIR.

[61]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[62]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[63]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[64]  Bhaskar Mitra,et al.  Neu-IR: The SIGIR 2016 Workshop on Neural Information Retrieval , 2016, SIGIR.

[65]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[66]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[67]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[68]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[69]  Donna K. Harman,et al.  The DARPA TIPSTER project , 1992, SIGF.

[70]  Michel Barlaud,et al.  Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[71]  M. de Rijke,et al.  Siamese CBOW: Optimizing Word Embeddings for Sentence Representations , 2016, ACL.

[72]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[73]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[74]  Bhaskar Mitra,et al.  Reply With: Proactive Recommendation of Email Attachments , 2017, CIKM.

[75]  Ellen M. Voorhees,et al.  Bias and the limits of pooling for large collections , 2007, Information Retrieval.

[76]  M. de Rijke,et al.  Pyndri: A Python Interface to the Indri Search Engine , 2017, ECIR.

[77]  Jiafeng Guo,et al.  Analysis of the Paragraph Vector Model for Information Retrieval , 2016, ICTIR.

[78]  Misha Denil,et al.  Noisy Activation Functions , 2016, ICML.

[79]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[80]  Mandar Mitra,et al.  Word Embedding based Generalized Language Model for Information Retrieval , 2015, SIGIR.

[81]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[82]  M. de Rijke,et al.  Short Text Similarity with Word Embeddings , 2015, CIKM.

[83]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[84]  Md. Mustafizur Rahman,et al.  Neural information retrieval: at the end of the early years , 2017, Information Retrieval Journal.

[85]  Ellen M. Voorhees,et al.  The TREC robust retrieval track , 2005, SIGF.

[86]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[87]  Jianfeng Gao,et al.  Deep stacking networks for information retrieval , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[88]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[89]  Christof Monz,et al.  Recurrent Memory Networks for Language Modeling , 2016, NAACL.