Neural Information Retrieval: A Literature Review

A recent "third wave" of Neural Network (NN) approaches now delivers state-of-the-art performance in many machine learning tasks, spanning speech recognition, computer vision, and natural language processing. Because these modern NNs often comprise multiple interconnected layers, this new NN research is often referred to as deep learning. Stemming from this tide of NN work, a number of researchers have recently begun to investigate NN approaches to Information Retrieval (IR). While deep NNs have yet to achieve the same level of success in IR as seen in other areas, the recent surge of interest and work in NNs for IR suggest that this state of affairs may be quickly changing. In this work, we survey the current landscape of Neural IR research, paying special attention to the use of learned representations of queries and documents (i.e., neural embeddings). We highlight the successes of neural IR thus far, catalog obstacles to its wider adoption, and suggest potentially promising directions for future research.

[1]  Yann LeCun,et al.  Very Deep Convolutional Networks for Natural Language Processing , 2016, ArXiv.

[2]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[3]  Fabrizio Silvestri,et al.  Context- and Content-aware Embeddings for Query Rewriting in Sponsored Search , 2015, SIGIR.

[4]  Bhaskar Mitra,et al.  Improving Document Ranking with Dual Word Embeddings , 2016, WWW.

[5]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[6]  Leif Azzopardi,et al.  Probabilistic hyperspace analogue to language , 2005, SIGIR '05.

[7]  Dorota Glowacka,et al.  Directing exploratory search: reinforcement learning from user interactions with keywords , 2013, IUI '13.

[8]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[9]  Thomas B. Moeslund,et al.  Learning Dynamic Classes of Events using Stacked Multilayer Perceptron Networks , 2016, SIGIR 2016.

[10]  Milad Shokouhi,et al.  Time-sensitive query auto-completion , 2012, SIGIR '12.

[11]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[12]  Mounia Lalmas,et al.  Evaluating XML retrieval effectiveness at INEX , 2007, SIGF.

[13]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[14]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Di Wang,et al.  A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering , 2015, ACL.

[16]  Jean-Pierre Chevallet,et al.  A Comparison of Deep Learning Based Query Expansion with Pseudo-Relevance Feedback and Mutual Information , 2016, ECIR.

[17]  Gang Wang,et al.  Selective Term Proximity Scoring Via BP-ANN , 2016, ArXiv.

[18]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[19]  Jakob Grue Simonsen,et al.  Non-Compositional Term Dependence for Information Retrieval , 2015, SIGIR.

[20]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[21]  W. Bruce Croft,et al.  Adaptability of Neural Networks on Varying Granularity IR Tasks , 2016, ArXiv.

[22]  Po Hu,et al.  Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering , 2015, ACL.

[23]  Jakob Grue Simonsen,et al.  Deep Learning Relevance: Creating Relevant Information (as Opposed to Retrieving it) , 2016, ArXiv.

[24]  Parth Gupta,et al.  Query expansion for mixed-script information retrieval , 2014, SIGIR.

[25]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[26]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[27]  Marc'Aurelio Ranzato,et al.  Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews , 2014, ICLR.

[28]  Xueqi Cheng,et al.  Learning Maximal Marginal Relevance Model via Directly Optimizing Diversity Evaluation Measures , 2015, SIGIR.

[29]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[30]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[31]  Hal Daumé,et al.  Improving Bilingual Projections via Sparse Covariance Matrices , 2011, EMNLP.

[32]  W. Bruce Croft,et al.  Query reformulation using anchor text , 2010, WSDM '10.

[33]  Jason Weston,et al.  Question Answering with Subgraph Embeddings , 2014, EMNLP.

[34]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[35]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[36]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[37]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[38]  Xueqi Cheng,et al.  Modeling Document Novelty with Neural Tensor Network for Search Result Diversification , 2016, SIGIR.

[39]  Guido Zuccon,et al.  Integrating and Evaluating Neural Word Embeddings in Information Retrieval , 2015, ADCS.

[40]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[41]  Jianfeng Gao,et al.  Modeling Interestingness with Deep Neural Networks , 2014, EMNLP.

[42]  Florent Perronnin,et al.  Aggregating Continuous Word Embeddings for Information Retrieval , 2013, CVSM@ACL.

[43]  Nemanja Djuric,et al.  Search Retargeting using Directed Query Embeddings , 2015, WWW.

[44]  Yang Song,et al.  Multi-Rate Deep Learning for Temporal Recommendation , 2016, SIGIR.

[45]  Ido Guy,et al.  Personalized social search based on the user's social network , 2009, CIKM.

[46]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[47]  Paul-Alexandru Chirita,et al.  Personalized query expansion for the web , 2007, SIGIR.

[48]  Jimmy J. Lin,et al.  Web question answering: is more always better? , 2002, SIGIR '02.

[49]  Bhaskar Mitra,et al.  Exploring Session Context using Distributed Representations of Queries and Reformulations , 2015, SIGIR.

[50]  Xugang Ye,et al.  Learning relevance from click data via neural network based similarity models , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[51]  Laure Soulier,et al.  Toward a Deep Neural Approach for Knowledge-Based IR , 2016, SIGIR 2016.

[52]  James Allan,et al.  Regression Rank: Learning to Meet the Opportunity of Descriptive Queries , 2009, ECIR.

[53]  Ziv Bar-Yossef,et al.  Context-sensitive query auto-completion , 2011, WWW.

[54]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[55]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[56]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[57]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[58]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[59]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[60]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[61]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[62]  Derek C. Rose,et al.  Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier] , 2010, IEEE Computational Intelligence Magazine.

[63]  Xuanjing Huang,et al.  Continuous word embeddings for detecting local text reuses at the semantic level , 2014, SIGIR.

[64]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[65]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[66]  Dong Yu,et al.  Deep Learning and Its Applications to Signal and Information Processing [Exploratory DSP] , 2011, IEEE Signal Processing Magazine.

[67]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[68]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[69]  James Allan,et al.  Fast query expansion using approximations of relevance models , 2010, CIKM.

[70]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[71]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[72]  Xiao Ma,et al.  From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[73]  M. de Rijke,et al.  Learning from homologous queries and semantically related terms for query auto completion , 2016, Inf. Process. Manag..

[74]  Xuehua Shen,et al.  iPinYou Global RTB Bidding Algorithm Competition Dataset , 2014, ADKDD'14.

[75]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[76]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[77]  W. Bruce Croft,et al.  aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model , 2016, CIKM.

[78]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[79]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[80]  Markus Koskela,et al.  LSTM-Based Predictions for Proactive Information Retrieval , 2016, SIGIR 2016.

[81]  Hang Li,et al.  A Deep Architecture for Matching Short Texts , 2013, NIPS.

[82]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[83]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[84]  Bhaskar Mitra,et al.  Query Auto-Completion for Rare Prefixes , 2015, CIKM.

[85]  Xueqi Cheng,et al.  Learning for search result diversification , 2014, SIGIR.

[86]  Utpal Garain,et al.  Using Word Embeddings for Automatic Query Expansion , 2016, ArXiv.

[87]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[88]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[89]  Li Deng,et al.  A tutorial survey of architectures, algorithms, and applications for deep learning , 2014, APSIPA Transactions on Signal and Information Processing.

[90]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[91]  Mandar Mitra,et al.  Word Embedding based Generalized Language Model for Information Retrieval , 2015, SIGIR.

[92]  Allan Hanbury,et al.  Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity , 2016, ArXiv.

[93]  M. de Rijke,et al.  Short Text Similarity with Word Embeddings , 2015, CIKM.

[94]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[95]  Jakob Grue Simonsen,et al.  A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion , 2015, CIKM.

[96]  I. Witten,et al.  The Reactive Keyboard: a predictive typing aid , 1990, Computer.

[97]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[98]  Jianfeng Gao,et al.  Deep Learning for Web Search and Natural Language Processing , 2015 .

[99]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[100]  Jason Weston,et al.  Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[101]  Fernando Diaz,et al.  UMass at TREC 2004: Novelty and HARD , 2004, TREC.

[102]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[103]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[104]  Lin Ma,et al.  Learning to Answer Questions from Image Using Convolutional Neural Network , 2015, AAAI.

[105]  Georgios Balikas,et al.  An empirical study on large scale text classification with skip-gram embeddings , 2016, ArXiv.

[106]  Noah A. Smith,et al.  What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[107]  Erik Ordentlich,et al.  Network-Efficient Distributed Word2vec Training System for Large Vocabularies , 2016, CIKM.

[108]  Nick Craswell,et al.  Query Expansion with Locally-Trained Word Embeddings , 2016, ACL.

[109]  James Allan,et al.  A Comparative Study of Utilizing Topic Models for Information Retrieval , 2009, ECIR.

[110]  Zhiyong Lu,et al.  Bridging the Gap: a Semantic Similarity Measure between Queries and Documents , 2016, ArXiv.

[111]  Yoshua Bengio,et al.  Learning Concept Embeddings for Query Expansion by Quantum Entropy Minimization , 2014, AAAI.

[112]  Thomas J. Watson,et al.  An empirical study of the naive Bayes classifier , 2001 .

[113]  Jun Wang,et al.  Deep Learning over Multi-field Categorical Data - - A Case Study on User Response Prediction , 2016, ECIR.

[114]  Frank E. Pollick,et al.  Understanding Information Need: An fMRI Study , 2016, SIGIR.

[115]  Ye Zhang,et al.  A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification , 2015, IJCNLP.

[116]  Jianfeng Gao,et al.  Clickthrough-based translation models for web search: from word models to phrase models , 2010, CIKM.

[117]  Yelong Shen,et al.  Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[118]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[119]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[120]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[121]  Peter Bruza,et al.  Inferring query models by computing information flow , 2002, CIKM '02.

[122]  Michael Granitzer,et al.  Evaluating Memory Efficiency and Robustness of Word Embeddings , 2016, ECIR.

[123]  James P. Callan,et al.  Learning to Reweight Terms with Distributed Representations , 2015, SIGIR.

[124]  Manoj Kumar Chinnakotla,et al.  Deep Feature Fusion Network for Answer Quality Prediction in Community Question Answering , 2016, ArXiv.

[125]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[126]  Yanjun Qi,et al.  Supervised semantic indexing , 2009, ECIR.

[127]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[128]  Bhaskar Mitra,et al.  A Dual Embedding Space Model for Document Ranking , 2016, ArXiv.

[129]  Carol Peters,et al.  Cross-Language Evaluation Forum: Objectives, Results, Achievements , 2004, Information Retrieval.

[130]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[131]  Lin Ma,et al.  Multimodal Convolutional Neural Networks for Matching Image and Sentence , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[132]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[133]  Alessandro Moschitti,et al.  Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks , 2015, SIGIR.

[134]  Yoshua Bengio,et al.  Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.

[135]  Rui Yan,et al.  Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System , 2016, SIGIR.

[136]  Dong Yu,et al.  Deep Learning and Its Applications to Signal and Information Processing , 2011 .

[137]  Wei Chu,et al.  Deep Learning Powered In-Session Contextual Ranking using Clickthrough Data , 2016 .

[138]  Pu-Jen Cheng,et al.  Learning user reformulation behavior for query auto-completion , 2014, SIGIR.

[139]  Craig MacDonald,et al.  Using word embeddings in Twitter election classification , 2016, Information Retrieval Journal.

[140]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[141]  Tao Tao,et al.  An exploration of proximity measures in information retrieval , 2007, SIGIR.

[142]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[143]  John D. Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR '99.

[144]  Marie-Francine Moens,et al.  Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings , 2015, SIGIR.

[145]  Xueqi Cheng,et al.  A Study of MatchPyramid Models on Ad-hoc Retrieval , 2016, ArXiv.

[146]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[147]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[148]  Gareth J. F. Jones,et al.  Representing Documents and Queries as Sets of Word Embedded Vectors for Information Retrieval , 2016, ArXiv.

[149]  W. Bruce Croft,et al.  An Optimization Framework for Merging Multiple Result Lists , 2015, CIKM.

[150]  Le Zhao,et al.  Term necessity prediction , 2010, CIKM.

[151]  W. Bruce Croft,et al.  Embedding-based Query Language Models , 2016, ICTIR.

[152]  Jacques Savoy,et al.  Term Proximity Scoring for Keyword-Based Retrieval Systems , 2003, ECIR.

[153]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[154]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[155]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[156]  Zhengdong Lu,et al.  Deep Learning for Information Retrieval , 2016, SIGIR.

[157]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[158]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[159]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[160]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[161]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[162]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[163]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[164]  Amit Singhal,et al.  Document expansion for speech retrieval , 1999, SIGIR '99.

[165]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[166]  Lei Yu,et al.  Deep Learning for Answer Sentence Selection , 2014, ArXiv.

[167]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[168]  Stephen Clark,et al.  Detecting Compositionality of Multi-Word Expressions using Nearest Neighbours in Vector Space Models , 2013, EMNLP.

[169]  Gang Wang,et al.  RC-NET: A General Framework for Incorporating Knowledge into Word Representations , 2014, CIKM.

[170]  M. de Rijke,et al.  Time-sensitive Personalized Query Auto-Completion , 2014, CIKM.

[171]  Guido Zuccon,et al.  Medical Semantic Similarity with a Neural Language Model , 2014, CIKM.

[172]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[173]  Ye Zhang,et al.  MGNC-CNN: A Simple Approach to Exploiting Multiple Word Embeddings for Sentence Classification , 2016, NAACL.

[174]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[175]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[176]  W. Bruce Croft,et al.  Estimating Embedding Vectors for Queries , 2016, ICTIR.

[177]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[178]  Zhong Zhou,et al.  Tweet2Vec: Character-Based Distributed Representations for Social Media , 2016, ACL.

[179]  Craig MacDonald,et al.  Modelling User Preferences using Word Embeddings for Context-Aware Venue Recommendation , 2016, ArXiv.

[180]  References , 1971 .

[181]  Philippe Mulhem,et al.  Toward Word Embedding for Personalized Information Retrieval , 2016, SIGIR 2016.