A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

In this paper, we propose a new latent semantic model that incorporates a convolutional-pooling structure over word sequences to learn low-dimensional, semantic vector representations for search queries and Web documents. In order to capture the rich contextual structures in a query or a document, we start with each word within a temporal context window in a word sequence to directly capture contextual features at the word n-gram level. Next, the salient word n-gram features in the word sequence are discovered by the model and are then aggregated to form a sentence-level feature vector. Finally, a non-linear transformation is applied to extract high-level semantic information to generate a continuous vector representation for the full text string. The proposed convolutional latent semantic model (CLSM) is trained on clickthrough data and is evaluated on a Web document ranking task using a large-scale, real-world data set. Results show that the proposed model effectively captures salient semantic information in queries and documents for the task while significantly outperforming previous state-of-the-art semantic models.

[1]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[2]  Gerard Salton,et al.  Automatic Routing and Retrieval Using Smart: TREC-2 , 1995, Inf. Process. Manag..

[3]  John D. Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR '99.

[4]  Karen Sparck Jones What is the Role of NLP in Text Retrieval , 1999 .

[5]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[6]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[7]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[8]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[9]  Ata Kabán,et al.  On an equivalence between PLSI and LDA , 2003, SIGIR.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Jianfeng Gao,et al.  Dependence language model for information retrieval , 2004, SIGIR '04.

[12]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[13]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[14]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[15]  Susan T. Dumais,et al.  Automatic cross-linguistic information retrieval using latent semantic indexing , 2007 .

[16]  W. Bruce Croft,et al.  Latent concept expansion using markov random fields , 2007, SIGIR.

[17]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[18]  Wu Chou,et al.  Discriminative learning in sequential pattern recognition , 2008, IEEE Signal Processing Magazine.

[19]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[20]  ChengXiang Zhai,et al.  Positional language models for information retrieval , 2009, SIGIR.

[21]  John C. Platt,et al.  Translingual Document Representations from Discriminative Projections , 2010, EMNLP.

[22]  Jianfeng Gao,et al.  Multi-style language model for web scale information retrieval , 2010, SIGIR '10.

[23]  Jianfeng Gao,et al.  Clickthrough-based translation models for web search: from word models to phrase models , 2010, CIKM.

[24]  Geoffrey E. Hinton,et al.  Discovering Binary Codes for Documents by Learning Deep Generative Models , 2011, Top. Cogn. Sci..

[25]  W. Bruce Croft,et al.  Parameterized concept weighting in verbose queries , 2011, SIGIR.

[26]  Jianfeng Gao,et al.  Clickthrough-based latent semantic models for web search , 2011, SIGIR.

[27]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[28]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[29]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[30]  Grgoire Montavon,et al.  Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[31]  Gökhan Tür,et al.  Towards deeper understanding: Deep convex networks for semantic utterance classification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32]  Li Deng,et al.  A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Hang Li,et al.  A Deep Architecture for Matching Short Texts , 2013, NIPS.

[34]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[35]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[36]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[37]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[38]  Jianfeng Gao,et al.  Modeling Interestingness with Deep Neural Networks , 2014, EMNLP.

[39]  Jianfeng Gao,et al.  Learning Continuous Phrase Representations for Translation Modeling , 2014, ACL.

[40]  Yelong Shen,et al.  Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[41]  Christopher Meek,et al.  Semantic Parsing for Single-Relation Question Answering , 2014, ACL.