论文信息 - A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

In this paper, we propose a new latent semantic model that incorporates a convolutional-pooling structure over word sequences to learn low-dimensional, semantic vector representations for search queries and Web documents. In order to capture the rich contextual structures in a query or a document, we start with each word within a temporal context window in a word sequence to directly capture contextual features at the word n-gram level. Next, the salient word n-gram features in the word sequence are discovered by the model and are then aggregated to form a sentence-level feature vector. Finally, a non-linear transformation is applied to extract high-level semantic information to generate a continuous vector representation for the full text string. The proposed convolutional latent semantic model (CLSM) is trained on clickthrough data and is evaluated on a Web document ranking task using a large-scale, real-world data set. Results show that the proposed model effectively captures salient semantic information in queries and documents for the task while significantly outperforming previous state-of-the-art semantic models.

[1] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[2] Gerard Salton,et al. Automatic Routing and Retrieval Using Smart: TREC-2 , 1995, Inf. Process. Manag..

[3] John D. Lafferty,et al. Information retrieval as statistical translation , 1999, SIGIR '99.

[4] Karen Sparck Jones. What is the Role of NLP in Text Retrieval , 1999 .

[5] W. Bruce Croft,et al. A general language model for information retrieval , 1999, CIKM '99.

[6] Jaana Kekäläinen,et al. IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[7] John D. Lafferty,et al. A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[8] J. van Leeuwen,et al. Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[9] Ata Kabán,et al. On an equivalence between PLSI and LDA , 2003, SIGIR.

[10] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11] Jianfeng Gao,et al. Dependence language model for information retrieval , 2004, SIGIR '04.

[12] Gregory N. Hullender,et al. Learning to rank using gradient descent , 2005, ICML.

[13] W. Bruce Croft,et al. A Markov random field model for term dependencies , 2005, SIGIR '05.

[14] W. Bruce Croft,et al. LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[15] Susan T. Dumais,et al. Automatic cross-linguistic information retrieval using latent semantic indexing , 2007 .

[16] W. Bruce Croft,et al. Latent concept expansion using markov random fields , 2007, SIGIR.

[17] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[18] Wu Chou,et al. Discriminative learning in sequential pattern recognition , 2008, IEEE Signal Processing Magazine.

[19] Geoffrey E. Hinton,et al. Semantic hashing , 2009, Int. J. Approx. Reason..

[20] ChengXiang Zhai,et al. Positional language models for information retrieval , 2009, SIGIR.

[21] John C. Platt,et al. Translingual Document Representations from Discriminative Projections , 2010, EMNLP.

[22] Jianfeng Gao,et al. Multi-style language model for web scale information retrieval , 2010, SIGIR '10.

[23] Jianfeng Gao,et al. Clickthrough-based translation models for web search: from word models to phrase models , 2010, CIKM.

[24] Geoffrey E. Hinton,et al. Discovering Binary Codes for Documents by Learning Deep Generative Models , 2011, Top. Cogn. Sci..

[25] W. Bruce Croft,et al. Parameterized concept weighting in verbose queries , 2011, SIGIR.

[26] Jianfeng Gao,et al. Clickthrough-based latent semantic models for web search , 2011, SIGIR.

[27] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[28] Graham W. Taylor,et al. Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[29] Andrew Y. Ng,et al. Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[30] Grgoire Montavon,et al. Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[31] Gökhan Tür,et al. Towards deeper understanding: Deep convex networks for semantic utterance classification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32] Li Deng,et al. A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33] Hang Li,et al. A Deep Architecture for Matching Short Texts , 2013, NIPS.

[34] Koray Kavukcuoglu,et al. Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[35] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[36] Geoffrey Zweig,et al. Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[37] Larry P. Heck,et al. Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[38] Jianfeng Gao,et al. Modeling Interestingness with Deep Neural Networks , 2014, EMNLP.

[39] Jianfeng Gao,et al. Learning Continuous Phrase Representations for Translation Modeling , 2014, ACL.

[40] Yelong Shen,et al. Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[41] Christopher Meek,et al. Semantic Parsing for Single-Relation Question Answering , 2014, ACL.