Topic-enhanced knowledge-aware retrieval model for diverse relevance estimation

Relevance measures the relation between query and document which contains several different dimensions, e.g., semantic similarity, topical relatedness, cognitive relevance (the relations in the aspect of knowledge), usefulness, timeliness, utility and so on. However, existing retrieval models mainly focus on semantic similarity and cognitive relevance while ignore other possible dimensions to model relevance. Topical relatedness, as an important dimension to measure relevance, is not well studied in existing neural information retrieval. In this paper, we propose a Topic Enhanced Knowledge-aware retrieval Model (TEKM) that jointly learns semantic similarity, knowledge relevance and topical relatedness to estimate relevance between query and document. We first construct a neural topic model to learn topical information and generate topic embeddings of a query. Then we combine the topic embeddings with a knowledge-aware retrieval model to estimate different dimensions of relevance. Specifically, we exploit kernel pooling to soft match topic embeddings with word and entity in a unified embedding space to generate fine-grained topical relatedness. The whole model is trained in an end-to-end manner. Experiments on a large-scale publicly available benchmark dataset show that TEKM outperforms existing retrieval models. Further analysis also shows how topic relatedness is modeled to improve traditional retrieval model with semantic similarity and knowledge relevance.

[1]  Yiqun Liu,et al.  Understanding Reading Attention Distribution during Relevance Judgement , 2018, CIKM.

[2]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[3]  Carlos Valle,et al.  Ad-hoc Information Retrieval based on Boosted Latent Dirichlet Allocated Topics , 2018, 2018 37th International Conference of the Chilean Computer Science Society (SCCC).

[4]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[5]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[6]  Thore Graepel,et al.  Kernel Topic Models , 2011, AISTATS.

[7]  Tefko Saracevic Relevance: A review of the literature and a framework for thinking on the notion in information science. Part III: Behavior and effects of relevance , 2007 .

[8]  Charles A. Sutton,et al.  Autoencoding Variational Inference For Topic Models , 2017, ICLR.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Raymond Y. K. Lau,et al.  Bootstrapping Social Emotion Classification with Semantically Rich Hybrid Neural Networks , 2017, IEEE Transactions on Affective Computing.

[11]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[12]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[13]  Benjamin Piwowarski,et al.  A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.

[14]  Tefko Saracevic,et al.  The Notion of Relevance in Information Science: Everybody knows what relevance is. But, what is it really? , 2016, The Notion of Relevance in Information Science.

[15]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[16]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[17]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[18]  Yiqun Liu,et al.  Incorporating Non-sequential Behavior into Click Models , 2015, SIGIR.

[19]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[20]  Xiangji Huang,et al.  A Simple Enhancement for Ad-hoc Information Retrieval via Topic Modelling , 2016, SIGIR.

[21]  Bhaskar Mitra,et al.  An Introduction to Neural Information Retrieval , 2018, Found. Trends Inf. Retr..

[22]  Yiqun Liu,et al.  Teach Machine How to Read: Reading Behavior Inspired Relevance Estimation , 2019, SIGIR.

[23]  Xueqi Cheng,et al.  A Deep Investigation of Deep IR Models , 2017, ArXiv.

[24]  SaracevicTefko Relevance: A review of the literature and a framework for thinking on the notion in information science. Part III: Behavior and effects of relevance , 2007 .

[25]  Stefano Mizzaro,et al.  How many relevances in information retrieval? , 1998, Interact. Comput..

[26]  Zhiyuan Liu,et al.  End-to-End Neural Ad-hoc Ranking with Kernel Pooling , 2017, SIGIR.

[27]  Zhiyuan Liu,et al.  Entity-Duet Neural Ranking: Understanding the Role of Knowledge Graph Semantics in Neural Information Retrieval , 2018, ACL.

[28]  Yiqun Liu,et al.  TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions , 2019, CIKM.

[29]  Tie-Yan Liu,et al.  Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling , 2018, SIGIR.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31]  W. Bruce Croft,et al.  A Deep Look into Neural Ranking Models for Information Retrieval , 2019, Inf. Process. Manag..

[32]  Tim Salimans,et al.  Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression , 2012, ArXiv.

[33]  Phil Blunsom,et al.  Discovering Discrete Latent Topics with Neural Variational Inference , 2017, ICML.

[34]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[35]  Peng Zhang,et al.  XLore: A Large-scale English-Chinese Bilingual Knowledge Graph , 2013, SEMWEB.

[36]  Jiaul H. Paik A novel TF-IDF weighting scheme for effective ranking , 2013, SIGIR.

[37]  Krisztian Balog,et al.  Entity Linking in Queries: Efficiency vs. Effectiveness , 2017, ECIR.

[38]  Yong Yu,et al.  Identification of ambiguous queries in web search , 2009, Inf. Process. Manag..

[39]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[40]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[41]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[42]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[43]  Nicholas J. Belkin People, Interacting with Information1 , 2016, SIGF.