An Introduction to Neural Information Retrieval

Neural models have been employed in many Information Retrieval scenarios, including ad-hoc retrieval, recommender systems, multi-media search, and even conversational systems that generate answers in response to natural language questions. An Introduction to Neural Information Retrieval provides a tutorial introduction to neural methods for ranking documents in response to a query, an important IR task. The monograph provides a complete picture of neural information retrieval techniques that culminate in supervised neural learning to rank models including deep neural network architectures that are trained end-to-end for ranking tasks. In reaching this point, the authors cover all the important topics, including the learning to rank framework and an overview of deep neural networks. This monograph provides an accessible, yet comprehensive, overview of the state-of-the-art of Neural Information Retrieval.

[1]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[2]  Bhaskar Mitra,et al.  Benchmark for Complex Answer Retrieval , 2017, ICTIR.

[3]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[4]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[5]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[6]  Blockin Blockin,et al.  Quick Training of Probabilistic Neural Nets by Importance Sampling , 2003 .

[7]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[8]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[9]  Donna K. Harman,et al.  Overview of the TREC 2015 LiveQA Track , 2015, TREC.

[10]  Richard Hans Robert Hahnloser,et al.  Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit , 2000, Nature.

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Roy Harris Saussure and His Interpreters , 1988 .

[14]  Tie-Yan Liu,et al.  Ranking Measures and Loss Functions in Learning to Rank , 2009, NIPS.

[15]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[16]  John D. Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR '99.

[17]  Christopher J. C. Burges,et al.  High accuracy retrieval with multiple nested ranker , 2006, SIGIR.

[18]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[19]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[20]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[21]  W. Bruce Croft,et al.  Improving Language Estimation with the Paragraph Vector Model for Ad-hoc Retrieval , 2016, SIGIR.

[22]  Dong Yu,et al.  Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features , 2016, KDD.

[23]  Alexandre Allauzen,et al.  Structured Output Layer neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[25]  Ashish Vaswani,et al.  Simple, Fast Noise-Contrastive Estimation for Large RNN Vocabularies , 2016, HLT-NAACL.

[26]  Yelong Shen,et al.  Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[27]  Stephen E. Robertson,et al.  Okapi at TREC-4 , 1995, TREC.

[28]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[29]  Marie-Francine Moens,et al.  Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings , 2015, SIGIR.

[30]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[32]  Jaap Kamps,et al.  Learning to Learn from Weak Supervision by Full Supervision , 2017, ArXiv.

[33]  R. Plackett The Analysis of Permutations , 1975 .

[34]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[35]  Norbert Fuhr,et al.  Optimum polynomial retrieval functions based on the probability ranking principle , 1989, TOIS.

[36]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[37]  Gerard de Melo,et al.  Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval , 2017, WSDM.

[38]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[39]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[40]  R. Duncan Luce,et al.  Individual Choice Behavior , 1959 .

[41]  Yisong Yue,et al.  On Using Simultaneous Perturbation Stochastic Approximation for IR Measures , and the Empirical Optimality of LambdaRank , 2007 .

[42]  Lin Ma,et al.  Multimodal Convolutional Neural Networks for Matching Image and Sentence , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[43]  Ronan Collobert,et al.  Word Embeddings through Hellinger PCA , 2013, EACL.

[44]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[45]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[46]  Pascal Vincent,et al.  An Exploration of Softmax Alternatives Belonging to the Spherical Loss Family , 2015, ICLR.

[47]  Filip Radlinski,et al.  On user interactions with query auto-completion , 2014, SIGIR.

[48]  W. Bruce Croft,et al.  Embedding-based Query Language Models , 2016, ICTIR.

[49]  Xiangji Huang,et al.  Proximity-based rocchio's model for pseudo relevance , 2012, SIGIR '12.

[50]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[51]  Kevin Duh,et al.  DyNet: The Dynamic Neural Network Toolkit , 2017, ArXiv.

[52]  Rabab Kreidieh Ward,et al.  Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[53]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[54]  Pascal Poupart,et al.  Variational Attention for Sequence-to-Sequence Models , 2017, COLING.

[55]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  W. Bruce Croft,et al.  End to End Long Short Term Memory Networks for Non-Factoid Question Answering , 2016, ICTIR.

[57]  Douglas L. T. Rohde,et al.  An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence , 2005 .

[58]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[59]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[60]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[61]  Pascal Vincent,et al.  The Z-loss: a shift and scale invariant classification loss belonging to the Spherical Family , 2016, ArXiv.

[62]  Xuan Liu,et al.  Multi-view Response Selection for Human-Computer Conversation , 2016, EMNLP.

[63]  James P. Callan,et al.  Learning to Reweight Terms with Distributed Representations , 2015, SIGIR.

[64]  Pradeep Dubey,et al.  BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies , 2015, ICLR.

[65]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[66]  Steven Levy,et al.  In the Plex: How Google Thinks, Works, and Shapes Our Lives , 2011 .

[67]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[68]  Xin Rong,et al.  word2vec Parameter Learning Explained , 2014, ArXiv.

[69]  Bhaskar Mitra,et al.  Reply With: Proactive Recommendation of Email Attachments , 2017, CIKM.

[70]  John D. Lafferty,et al.  Document Language Models, Query Models, and Risk Minimization for Information Retrieval , 2001, SIGIR Forum.

[71]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[72]  W. Bruce Croft,et al.  Semantic Matching by Non-Linear Word Transportation for Information Retrieval , 2016, CIKM.

[73]  Jason Weston,et al.  Large-scale Simple Question Answering with Memory Networks , 2015, ArXiv.

[74]  Gerard de Melo,et al.  PACRR: A Position-Aware Neural IR Model for Relevance Matching , 2017, EMNLP.

[75]  Xueqi Cheng,et al.  Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN , 2016, IJCAI.

[76]  Christoph Goller,et al.  Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[77]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[78]  Jianfeng Gao,et al.  deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets , 2015, ACL.

[79]  Bhaskar Mitra,et al.  Report on the Second SIGIR Workshop on Neural Information Retrieval (Neu-IR'17) , 2018, SIGF.

[80]  W. Bruce Croft,et al.  aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model , 2016, CIKM.

[81]  Pascal Vincent,et al.  Exact gradient updates in time independent of output size for the spherical loss family , 2016, ArXiv.

[82]  Tong Zhang,et al.  Subset Ranking Using Regression , 2006, COLT.

[83]  Misha Denil,et al.  Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network , 2014, ArXiv.

[84]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[85]  Milad Shokouhi,et al.  From Queries to Cards: Re-ranking Proactive Card Recommendations Based on Reactive Search History , 2015, SIGIR.

[86]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[87]  David Hawking,et al.  Overview of the TREC-2002 Web Track , 2002, TREC.

[88]  John C. Platt,et al.  Learning Discriminative Projections for Text Similarity Measures , 2011, CoNLL.

[89]  Fabrizio Silvestri,et al.  Context- and Content-aware Embeddings for Query Rewriting in Sponsored Search , 2015, SIGIR.

[90]  Ivan Markovsky,et al.  Low Rank Approximation - Algorithms, Implementation, Applications , 2018, Communications and Control Engineering.

[91]  Bhaskar Mitra,et al.  Optimizing Query Evaluations Using Reinforcement Learning for Web Search , 2018, SIGIR.

[92]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[93]  M. de Rijke,et al.  Pyndri: A Python Interface to the Indri Search Engine , 2017, ECIR.

[94]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC 13: Web and Hard Tracks , 2004, TREC.

[95]  Rabab Kreidieh Ward,et al.  Semantic Modelling with Long-Short-Term Memory for Information Retrieval , 2014, ArXiv.

[96]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[97]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[98]  Nick Craswell,et al.  Learning to Match using Local and Distributed Representations of Text for Web Search , 2016, WWW.

[99]  Chris Buckley,et al.  Pivoted Document Length Normalization , 1996, SIGIR Forum.

[100]  Ruslan Salakhutdinov,et al.  A Multiplicative Model for Learning Distributed Text-Based Attribute Representations , 2014, NIPS.

[101]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[102]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[103]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[104]  Yang Song,et al.  Query-Less: Predicting Task Repetition for NextGen Proactive Search and Recommendation Engines , 2016, WWW.

[105]  Alessandro Lenci,et al.  Distributional Memory: A General Framework for Corpus-Based Semantics , 2010, CL.

[106]  Peter Bailey,et al.  Engineering a multi-purpose test collection for Web retrieval experiments , 2003, Inf. Process. Manag..

[107]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[108]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[109]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[110]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[111]  Thorsten Joachims,et al.  Unbiased Learning-to-Rank with Biased Feedback , 2016, WSDM.

[112]  F. Saussure,et al.  Course in General Linguistics , 1960 .

[113]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[114]  Xiaodong Liu,et al.  Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval , 2015, NAACL.

[115]  Pinar Donmez,et al.  On the local optimality of LambdaRank , 2009, SIGIR.

[116]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[117]  Bhaskar Mitra,et al.  Exploring Session Context using Distributed Representations of Queries and Reformulations , 2015, SIGIR.

[118]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[119]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[120]  Xueqi Cheng,et al.  A Study of MatchPyramid Models on Ad-hoc Retrieval , 2016, ArXiv.

[121]  Thorsten Joachims,et al.  Eye-tracking analysis of user behavior in WWW search , 2004, SIGIR '04.

[122]  Joshua Goodman,et al.  Classes for fast maximum entropy training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[123]  Fernando Diaz,et al.  Robust models of mouse movement on dynamic web search results pages , 2013, CIKM.

[124]  Krisztian Balog,et al.  Anticipating Information Needs Based on Check-in Activity , 2017, WSDM.

[125]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[126]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[127]  David J. C. MacKay,et al.  A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[128]  Ryen W. White,et al.  Anticipatory search: using context to initiate search , 2012, SIGIR '12.

[129]  Yoshua Bengio,et al.  Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model , 2008, IEEE Transactions on Neural Networks.

[130]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[131]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[132]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[133]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[134]  Xueqi Cheng,et al.  Modeling Document Novelty with Neural Tensor Network for Search Result Diversification , 2016, SIGIR.

[135]  Florent Perronnin,et al.  Aggregating Continuous Word Embeddings for Information Retrieval , 2013, CVSM@ACL.

[136]  Xiaojun Wan,et al.  The earth mover's distance as a semantic measure for document similarity , 2005, CIKM '05.

[137]  Donna K. Harman,et al.  Overview of the Eighth Text REtrieval Conference (TREC-8) , 1999, TREC.

[138]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[139]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[140]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[141]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[142]  Stephen E. Robertson,et al.  Optimisation methods for ranking functions with multiple parameters , 2006, CIKM '06.

[143]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[144]  Sanjeev Arora,et al.  RAND-WALK: A Latent Variable Model Approach to Word Embeddings , 2015 .

[145]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[146]  Bhaskar Mitra,et al.  Neu-IR: The SIGIR 2016 Workshop on Neural Information Retrieval , 2016, SIGIR.

[147]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[148]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .

[149]  Christopher D. Manning Understanding Human Language: Can NLP and Deep Learning Help? , 2016, SIGIR.

[150]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[151]  Qiang Wang,et al.  Benchmarking State-of-the-Art Deep Learning Software Tools , 2016, 2016 7th International Conference on Cloud Computing and Big Data (CCBD).

[152]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[153]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[154]  Nick Craswell,et al.  Neural Models for Full Text Search , 2017, WSDM.

[155]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[156]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[157]  Geoffrey Zweig,et al.  Speed regularization and optimality in word classing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[158]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[159]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[160]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[161]  Matt J. Kusner,et al.  Supervised Word Mover's Distance , 2016, NIPS.

[162]  Bhaskar Mitra,et al.  An Eye-tracking Study of User Interactions with Query Auto Completion , 2014, CIKM.

[163]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[164]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[165]  Edward Cutrell,et al.  An eye tracking study of the effect of target rank on web search , 2007, CHI.

[166]  Bowen Zhou,et al.  ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs , 2015, TACL.

[167]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[168]  Bhaskar Mitra,et al.  Neural Networks for Information Retrieval , 2017, SIGIR.

[169]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[170]  Bhaskar Mitra,et al.  Neural Text Embeddings for Information Retrieval , 2017, WSDM.

[171]  Bhaskar Mitra,et al.  Improving Document Ranking with Dual Word Embeddings , 2016, WWW.

[172]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[173]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[174]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[175]  Eric Brill Processing Natural Language without Natural Language Processing , 2003, CICLing.

[176]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[177]  Mandar Mitra,et al.  Word Embedding based Generalized Language Model for Information Retrieval , 2015, SIGIR.

[178]  Peng Zhang,et al.  IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models , 2017, SIGIR.

[179]  Bhaskar Mitra,et al.  Cross Domain Regularization for Neural Ranking Models using Adversarial Learning , 2018, SIGIR.

[180]  Moustapha Cissé,et al.  Efficient softmax approximation for GPUs , 2016, ICML.

[181]  Jannis Bulian,et al.  Ask the Right Questions: Active Question Reformulation with Reinforcement Learning , 2017, ICLR.

[182]  Bhaskar Mitra,et al.  Report on the SIGIR 2016 Workshop on Neural Information Retrieval (Neu-IR) , 2016, SIGIR Forum.

[183]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[184]  Nick Craswell Mean Reciprocal Rank , 2009, Encyclopedia of Database Systems.

[185]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[186]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[187]  Qiang Wu,et al.  McRank: Learning to Rank Using Multiple Classification and Gradient Boosting , 2007, NIPS.

[188]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[189]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[190]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[191]  Yinglian Xie,et al.  Locality in search engine queries and its implications for caching , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[192]  Jakob Grue Simonsen,et al.  A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion , 2015, CIKM.

[193]  John A Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD , 2012, Behavior Research Methods.

[194]  Ming-Wei Chang,et al.  A Knowledge-Grounded Neural Conversation Model , 2017, AAAI.

[195]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[196]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[197]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[198]  Hang Li,et al.  A Deep Architecture for Matching Short Texts , 2013, NIPS.

[199]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[200]  Wang Ling,et al.  Learning to Compose Words into Sentences with Reinforcement Learning , 2016, ICLR.

[201]  Utpal Garain,et al.  Using Word Embeddings for Automatic Query Expansion , 2016, ArXiv.

[202]  栄藤 稔 ビッグデータとパターン認識 ~ More data usually beats better algorithms? ~ , 2011 .

[203]  Bhaskar Mitra,et al.  Luandri: A Clean Lua Interface to the Indri Search Engine , 2017, SIGIR.

[204]  D. Chandler Semiotics For Beginners , 2011 .

[205]  Alessandro Moschitti,et al.  Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks , 2015, SIGIR.

[206]  Jimmy J. Lin,et al.  A cascade ranking model for efficient ranked retrieval , 2011, SIGIR.

[207]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[208]  Rui Yan,et al.  Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System , 2016, SIGIR.

[209]  Xueqi Cheng,et al.  Sparse Word Embeddings Using ℓ1 Regularized Online Learning , 2016, IJCAI.

[210]  Xueqi Cheng,et al.  Semantic Regularities in Document Representations , 2016, ArXiv.

[211]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[212]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[213]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[214]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[215]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[216]  Yi Yang,et al.  WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[217]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[218]  Jiafeng Guo,et al.  Analysis of the Paragraph Vector Model for Information Retrieval , 2016, ICTIR.

[219]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[220]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[221]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[222]  Pierre Baldi,et al.  Neural Networks for Fingerprint Recognition , 1993, Neural Computation.

[223]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[224]  M. de Rijke,et al.  Learning Latent Vector Spaces for Product Search , 2016, CIKM.

[225]  Chih-Hung Hsieh,et al.  Towards better measurement of attention and satisfaction in mobile search , 2014, SIGIR.

[226]  Markus Koskela,et al.  LSTM-Based Predictions for Proactive Information Retrieval , 2016, SIGIR 2016.

[227]  Xiaohui Yan,et al.  Learning Topics in Short Texts by Non-negative Matrix Factorization on Term Correlation Matrix , 2013, SDM.

[228]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[229]  Marcel Worring,et al.  Unsupervised, Efficient and Semantic Expertise Retrieval , 2016, WWW.

[230]  Xueqi Cheng,et al.  Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations , 2015, ACL.

[231]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[232]  Ellen M. Voorhees,et al.  Building a question answering test collection , 2000, SIGIR '00.

[233]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[234]  Matthew Richardson,et al.  MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text , 2013, EMNLP.

[235]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[236]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[237]  W. Bruce Croft,et al.  Adaptability of Neural Networks on Varying Granularity IR Tasks , 2016, ArXiv.

[238]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[239]  Bhaskar Mitra,et al.  Query Auto-Completion for Rare Prefixes , 2015, CIKM.

[240]  Xiaojun Wan,et al.  A novel document similarity measure based on earth mover's distance , 2007, Inf. Sci..

[241]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[242]  Nick Craswell,et al.  Query Expansion with Locally-Trained Word Embeddings , 2016, ACL.

[243]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[244]  Jason Weston,et al.  The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations , 2015, ICLR.

[245]  Bhaskar Mitra,et al.  Neural Ranking Models with Multiple Document Fields , 2017, WSDM.

[246]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[247]  M. de Rijke,et al.  Siamese CBOW: Optimizing Word Embeddings for Sentence Representations , 2016, ACL.

[248]  Laure Soulier,et al.  Toward a Deep Neural Approach for Knowledge-Based IR , 2016, SIGIR 2016.

[249]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[250]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[251]  Wenlin Chen,et al.  Strategies for Training Large Vocabulary Neural Language Models , 2015, ACL.

[252]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[253]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[254]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[255]  Maarten de Rijke,et al.  A Context-aware Time Model for Web Search , 2016, SIGIR.

[256]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[257]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[258]  Yi Fang,et al.  Variational Deep Semantic Hashing for Text Documents , 2017, SIGIR.

[259]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[260]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[261]  Guido Zuccon,et al.  Integrating and Evaluating Neural Word Embeddings in Information Retrieval , 2015, ADCS.

[262]  Jianfeng Gao,et al.  Modeling Interestingness with Deep Neural Networks , 2014, EMNLP.

[263]  Ashish Vaswani,et al.  Decoding with Large-Scale Neural Language Models Improves Translation , 2013, EMNLP.

[264]  M. de Rijke,et al.  A Neural Click Model for Web Search , 2016, WWW.

[265]  Manik Varma,et al.  Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications , 2016, KDD.

[266]  Nemanja Djuric,et al.  Search Retargeting using Directed Query Embeddings , 2015, WWW.

[267]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[268]  Stephen E. Robertson,et al.  Extending average precision to graded relevance judgments , 2010, SIGIR.

[269]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[270]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[271]  W. Bruce Croft,et al.  Estimating Embedding Vectors for Queries , 2016, ICTIR.

[272]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[273]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[274]  Susan T. Dumais,et al.  Similarity Measures for Short Segments of Text , 2007, ECIR.

[275]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[276]  Xueqi Cheng,et al.  A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations , 2015, AAAI.

[277]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[278]  Jianfeng Gao,et al.  A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.

[279]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.

[280]  Christopher Potts,et al.  A Fast Unified Model for Parsing and Sentence Understanding , 2016, ACL.

[281]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[282]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, Neural Information Processing Systems.

[283]  Bhaskar Mitra,et al.  A Proposal for Evaluating Answer Distillation from Web Data , 2016 .

[284]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[285]  Chris Dyer,et al.  Notes on Noise Contrastive Estimation and Negative Sampling , 2014, ArXiv.

[286]  Han Zhao,et al.  Self-Adaptive Hierarchical Sentence Model , 2015, IJCAI.

[287]  Gregory Shakhnarovich,et al.  FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[288]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[289]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[290]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[291]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[292]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[293]  Bhaskar Mitra,et al.  A Dual Embedding Space Model for Document Ranking , 2016, ArXiv.

[294]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[295]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[296]  Yann LeCun,et al.  Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[297]  Kyunghyun Cho,et al.  Task-Oriented Query Reformulation with Reinforcement Learning , 2017, EMNLP.