Modeling Interestingness with Deep Neural Networks

This paper presents a deep semantic similarity model (DSSM), a special type of deep neural networks designed for text analysis, for recommending target documents to be of interest to a user based on a source document that she is reading. We observe, identify, and detect naturally occurring signals of interestingness in click transitions on the Web between source and target documents, which we collect from commercial Web browser logs. The DSSM is trained on millions of Web transitions, and maps source-target document pairs to feature vectors in a latent space in such a way that the distance between source documents and their corresponding interesting targets in that space is minimized. The effectiveness of the DSSM is demonstrated using two interestingness tasks: automatic highlighting and contextual entity search. The results on large-scale, real-world datasets show that the semantics of documents are important for modeling interestingness and that the DSSM leads to significant quality improvement on both tasks, outperforming not only the classic document models that do not use semantics but also state-of-the-art topic models.

[1]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[2]  John C. Platt,et al.  Learning Discriminative Projections for Text Similarity Measures , 2011, CoNLL.

[3]  Andrei Z. Broder,et al.  A semantic approach to contextual advertising , 2007, SIGIR.

[4]  Bernardo A. Huberman,et al.  Predicting the popularity of online content , 2008, Commun. ACM.

[5]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Tad Hogg,et al.  Using a model of social dynamics to predict popularity of news , 2010, WWW '10.

[7]  Deepa Paranjpe,et al.  Learning document aboutness from implicit user feedback and document structure , 2009, CIKM.

[8]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[9]  Bernardo A. Huberman,et al.  The Pulse of News in Social Media: Forecasting Popularity , 2012, ICWSM.

[10]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Jianshu Chen,et al.  A Primal-Dual Method for Training Recurrent Neural Networks Constrained by the Echo-State Property , 2013 .

[13]  John D. Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR '99.

[14]  Michael Gamon,et al.  Identifying salient entities in web pages , 2013, CIKM.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[17]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[18]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[19]  Michael Gamon,et al.  Predicting Interesting Things in Text , 2014, COLING.

[20]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[21]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Li Deng,et al.  A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  John C. Platt,et al.  Translingual Document Representations from Discriminative Projections , 2010, EMNLP.

[24]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[25]  Jianfeng Gao,et al.  Clickthrough-based translation models for web search: from word models to phrase models , 2010, CIKM.

[26]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[27]  Geoffrey E. Hinton,et al.  Discovering Binary Codes for Documents by Learning Deep Generative Models , 2011, Top. Cogn. Sci..

[28]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[29]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[30]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[31]  Susan T. Dumais,et al.  Automatic cross-linguistic information retrieval using latent semantic indexing , 2007 .

[32]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[33]  Joshua Goodman,et al.  Finding advertising keywords on web pages , 2006, WWW '06.

[34]  Vasilis Efthymiou,et al.  Entity resolution in the web of data , 2013, Entity Resolution in the Web of Data.

[35]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[36]  Jianfeng Gao,et al.  Clickthrough-based latent semantic models for web search , 2011, SIGIR.

[37]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[38]  Jianfeng Gao,et al.  Learning Continuous Phrase Representations for Translation Modeling , 2014, ACL.