Translation Language Model Enhancement for Community Question Retrieval Using User Adoption Answer

Community Question Answering (CQA) services on Web provide an important alternative for knowledge acquisition. As an essential component of CQA services, question retrieval can help users save much time by finding relevant questions. However, there is a “gap” between queried questions and candidate questions, which is called lexical chasm or word mismatch problem. In this paper, we improve traditional Topic inference based Translation Language Model (T\(^2\)LM) by using the topic information of queries. Moreover, we make use of user information, specifically the number of user adoption answers, for further enhancing our proposed model. In our model, the translation model and the topic model “bridge” the word gap by linking different words. Besides, user information that has no direct relation with semantics is used to help us “bypass” the gap. By combining both of them we obtain a considerable improvement for the performance of question retrieval. Experimental results on a real Chinese CQA data set show that our proposed model improves the retrieval performance over T\(^2\)LM baseline by 7.5% in terms of Mean Average Precision (MAP).

[1]  Tat-Seng Chua,et al.  The Use of Dependency Relation Graph to Enhance the Term Weighting in Question Retrieval , 2012, COLING.

[2]  Hinrich Schütze,et al.  Introduction to Information Retrieval: Evaluation in information retrieval , 2008 .

[3]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[4]  Gonzalo Navarro,et al.  Word-based self-indexes for natural language text , 2012, TOIS.

[5]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[6]  Vibhu O. Mittal,et al.  Bridging the lexical chasm: statistical approaches to answer-finding , 2000, SIGIR '00.

[7]  Nadia Magnenat-Thalmann,et al.  Category hierarchy maintenance: a data-driven approach , 2012, SIGIR '12.

[8]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[9]  Tat-Seng Chua,et al.  Capturing the Semantics of Key Phrases Using Multiple Languages for Question Retrieval , 2016, IEEE Transactions on Knowledge and Data Engineering.

[10]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[11]  David van Dijk,et al.  Early Detection of Topical Expertise in Community Question Answering , 2015, SIGIR.

[12]  W. Bruce Croft,et al.  Retrieval models for question and answer archives , 2008, SIGIR '08.

[13]  Iryna Gurevych,et al.  Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding , 2009, ACL.

[14]  Joemon M. Jose,et al.  A Semantic Graph based Topic Model for Question Retrieval in Community Question Answering , 2016, WSDM.

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Li Cai,et al.  Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives , 2011, ACL.

[17]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[18]  Li Cai,et al.  Learning the Latent Topics for Question Retrieval in Community QA , 2011, IJCNLP.

[19]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[20]  Ben He,et al.  Question-answer topic model for question retrieval in community question answering , 2012, CIKM.

[21]  Christian S. Jensen,et al.  Approaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives , 2012, TOIS.

[22]  Yi Liu,et al.  Statistical Machine Translation for Query Expansion in Answer Retrieval , 2007, ACL.

[23]  Idan Szpektor,et al.  Novelty based Ranking of Human Answers for Community Questions , 2016, SIGIR.