Getting Started with Neural Models for Semantic Matching in Web Search

The vocabulary mismatch problem is a long-standing problem in information retrieval. Semantic matching holds the promise of solving the problem. Recent advances in language technology have given rise to unsupervised neural models for learning representations of words as well as bigger textual units. Such representations enable powerful semantic matching methods. This survey is meant as an introduction to the use of neural models for semantic matching. To remain focused we limit ourselves to web search. We detail the required background and terminology, a taxonomy grouping the rapidly growing body of work in the area, and then survey work on neural models for semantic matching in the context of three tasks: query suggestion, ad retrieval, and document retrieval. We include a section on resources and best practices that we believe will help readers who are new to the area. We conclude with an assessment of the state-of-the-art and suggestions for future work.

[1]  Bhaskar Mitra,et al.  Query Auto-Completion for Rare Prefixes , 2015, CIKM.

[2]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[3]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[4]  Bhaskar Mitra,et al.  A Dual Embedding Space Model for Document Ranking , 2016, ArXiv.

[5]  Jean-Pierre Chevallet,et al.  A Comparison of Deep Learning Based Query Expansion with Pseudo-Relevance Feedback and Mutual Information , 2016, ECIR.

[6]  Nick Craswell,et al.  Query Expansion with Locally-Trained Word Embeddings , 2016, ACL.

[7]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[8]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[9]  Rabab Kreidieh Ward,et al.  Semantic Modelling with Long-Short-Term Memory for Information Retrieval , 2014, ArXiv.

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11]  James L. McClelland Parallel Distributed Processing , 2005 .

[12]  Florent Perronnin,et al.  Aggregating Continuous Word Embeddings for Information Retrieval , 2013, CVSM@ACL.

[13]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[14]  Xu Jun,et al.  Semantic Matching in Information Retrieval , 2014, SIGIR 2014.

[15]  W. Bruce Croft,et al.  Estimating Embedding Vectors for Queries , 2016, ICTIR.

[16]  W. Bruce Croft,et al.  Embedding-based Query Language Models , 2016, ICTIR.

[17]  M. de Rijke,et al.  Ad Hoc Monitoring of Vocabulary Shifts over Time , 2015, CIKM.

[18]  Xiaodong Liu,et al.  Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval , 2015, NAACL.

[19]  Wenlin Chen,et al.  Strategies for Training Large Vocabulary Neural Language Models , 2015, ACL.

[20]  Bhaskar Mitra,et al.  Exploring Session Context using Distributed Representations of Queries and Reformulations , 2015, SIGIR.

[21]  M. de Rijke,et al.  Short Text Similarity with Word Embeddings , 2015, CIKM.

[22]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[23]  James P. Callan,et al.  Query Transformations for Result Merging , 2014, TREC.

[24]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[25]  Donna K. Harman,et al.  Overview of the Reliable Information Access Workshop , 2009, Information Retrieval.

[26]  Wei Chu,et al.  Deep Learning Powered In-Session Contextual Ranking using Clickthrough Data , 2016 .

[27]  Lizhen Liu,et al.  CNU System in NTCIR-11 IMine Task , 2014, NTCIR.

[28]  Jiafeng Guo,et al.  Analysis of the Paragraph Vector Model for Information Retrieval , 2016, ICTIR.

[29]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[30]  Hao Wu,et al.  Hierarchical Neural Language Models for Joint Representation of Streaming Documents and their Content , 2015, WWW.

[31]  Jakob Grue Simonsen,et al.  Deep Learning Relevance: Creating Relevant Information (as Opposed to Retrieving it) , 2016, ArXiv.

[32]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[33]  M. de Rijke,et al.  Learning Latent Vector Spaces for Product Search , 2016, CIKM.

[34]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[35]  Yoshua Bengio,et al.  Quick Training of Probabilistic Neural Nets by Importance Sampling , 2003, AISTATS.

[36]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[37]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[38]  Mandar Mitra,et al.  Word Embedding based Generalized Language Model for Information Retrieval , 2015, SIGIR.

[39]  Kyunghyun Cho,et al.  Natural Language Understanding with Distributed Representation , 2015, ArXiv.

[40]  Rabab K. Ward,et al.  Deep Sentence Embedding Using the Long Short-Term Memory Networks , 2015 .

[41]  Marie-Francine Moens,et al.  Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings , 2015, SIGIR.

[42]  Jakob Grue Simonsen,et al.  A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion , 2015, CIKM.

[43]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[44]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[45]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[46]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[47]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[48]  Marcel Worring,et al.  Unsupervised, Efficient and Semantic Expertise Retrieval , 2016, WWW.

[49]  Quoc V. Le,et al.  Document Embedding with Paragraph Vectors , 2015, ArXiv.

[50]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[51]  Felix Hill,et al.  Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.

[52]  Eduard H. Hovy,et al.  When Are Tree Structures Necessary for Deep Learning of Representations? , 2015, EMNLP.

[53]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[54]  Gang Wang,et al.  Selective Term Proximity Scoring Via BP-ANN , 2016, ArXiv.

[55]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[56]  Maarten de Rijke,et al.  A Context-aware Time Model for Web Search , 2016, SIGIR.

[57]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[58]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[59]  Guido Zuccon,et al.  Integrating and Evaluating Neural Word Embeddings in Information Retrieval , 2015, ADCS.

[60]  M. de Rijke,et al.  A Neural Click Model for Web Search , 2016, WWW.

[61]  Nemanja Djuric,et al.  Search Retargeting using Directed Query Embeddings , 2015, WWW.

[62]  Javad Azimi,et al.  Ads Keyword Rewriting Using Search Engine Results , 2015, WWW.

[63]  Yelong Shen,et al.  Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[64]  James P. Callan,et al.  Learning to Reweight Terms with Distributed Representations , 2015, SIGIR.

[65]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[66]  Utpal Garain,et al.  Using Word Embeddings for Automatic Query Expansion , 2016, ArXiv.

[67]  Hang Li,et al.  Semantic Matching in Search , 2014, SMIR@SIGIR.

[68]  Gareth J. F. Jones,et al.  Representing Documents and Queries as Sets of Word Embedded Vectors for Information Retrieval , 2016, ArXiv.

[69]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[70]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[71]  Zhengdong Lu,et al.  Deep Learning for Information Retrieval , 2016, SIGIR.

[72]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[73]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[74]  M. de Rijke,et al.  A Survey of Query Auto Completion in Information Retrieval , 2016, Found. Trends Inf. Retr..

[75]  Mike Thelwall,et al.  Synthesis Lectures on Information Concepts, Retrieval, and Services , 2009 .

[76]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[77]  Bhaskar Mitra,et al.  Improving Document Ranking with Dual Word Embeddings , 2016, WWW.

[78]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[79]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[80]  Thomas B. Moeslund,et al.  Learning Dynamic Classes of Events using Stacked Multilayer Perceptron Networks , 2016, SIGIR 2016.

[81]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[82]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[83]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[84]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[85]  W. Bruce Croft,et al.  Improving Language Estimation with the Paragraph Vector Model for Ad-hoc Retrieval , 2016, SIGIR.

[86]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[87]  Zhongfei Zhang,et al.  Attention Based Recurrent Neural Networks for Online Advertising , 2016, WWW.

[88]  M. de Rijke,et al.  Learning from homologous queries and semantically related terms for query auto completion , 2016, Inf. Process. Manag..

[89]  Zhongfei Zhang,et al.  DeepIntent: Learning Attentions for Online Advertising with Recurrent Neural Networks , 2016, KDD.

[90]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[91]  Fabrizio Silvestri,et al.  Context- and Content-aware Embeddings for Query Rewriting in Sponsored Search , 2015, SIGIR.

[92]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..