Learning semantic representation with neural networks for community question answering retrieval

Learning the semantic representation using neural network architecture.The neural network is trained via pre-training and fine-tuning phase.The learned semantic level feature is incorporated into a LTR framework. In community question answering (cQA), users pose queries (or questions) on portals like Yahoo! Answers which can then be answered by other users who are often knowledgeable on the subject. cQA is increasingly popular on the Web, due to its convenience and effectiveness in connecting users with queries and those with answers. In this article, we study the problem of finding previous queries (e.g., posed by other users) which may be similar to new queries, and adapting their answers as the answers to the new queries. A key challenge here is to the bridge the lexical gap between new queries and old answers. For example, "company" in the queries may correspond to "firm" in the answers. To address this challenge, past research has proposed techniques similar to machine translation that "translate" old answers to ones using the words in the new queries. However, a key limitation of these works is that they assume queries and answers are parallel texts, which is hardly true in reality. As a result, the translated or rephrased answers may not look intuitive.In this article, we propose a novel approach to learn the semantic representation of queries and answers by using a neural network architecture. The learned semantic level features are finally incorporated into a learning to rank framework. We have evaluated our approach using a large-scale data set. Results show that the approach can significantly outperform existing approaches.

[1]  Ming Liu,et al.  Multimodal DBN for Predicting High-Quality Answers in cQA portals , 2013, ACL.

[2]  Kai Wang,et al.  A syntactic tree matching approach to finding similar questions in community-based qa services , 2009, SIGIR.

[3]  Fang Liu,et al.  Improving Question Retrieval in Community Question Answering Using World Knowledge , 2013, IJCAI.

[4]  Juan Luis Castro,et al.  Learning regular expressions to template-based FAQ retrieval systems , 2013, Knowl. Based Syst..

[5]  Zhoujun Li,et al.  Question Retrieval with High Quality Answers in Community Question Answering , 2014, CIKM.

[6]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[7]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[8]  Ben He,et al.  Question-answer topic model for question retrieval in community question answering , 2012, CIKM.

[9]  Yi Liu,et al.  Statistical Machine Translation for Query Expansion in Answer Retrieval , 2007, ACL.

[10]  Hang Li,et al.  A Deep Architecture for Matching Short Texts , 2013, NIPS.

[11]  Yong Yu,et al.  Searching Questions by Identifying Question Topic and Question Focus , 2008, ACL.

[12]  Tingting He,et al.  An empirical study of topic-sensitive probabilistic model for expert finding in question answer communities , 2014, Knowl. Based Syst..

[13]  W. Bruce Croft,et al.  Retrieval models for question and answer archives , 2008, SIGIR '08.

[14]  Iryna Gurevych,et al.  Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding , 2009, ACL.

[15]  Li Cai,et al.  Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives , 2011, ACL.

[16]  Christian S. Jensen,et al.  The use of categorization information in language models for question retrieval , 2009, CIKM.

[17]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[18]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Jun Zhao,et al.  Towards faster and better retrieval models for question search , 2013, CIKM.

[21]  Jun Zhao,et al.  Group Non-negative Matrix Factorization with Natural Categories for Question Retrieval in Community Question Answer Archives , 2014, COLING.

[22]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[23]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[24]  Chun-Kai Huang,et al.  QA document recommendations for communities of question-answering websites , 2014, Knowl. Based Syst..

[25]  Idan Szpektor,et al.  Improving Term Weighting for Community Question Answering Search Using Syntactic Analysis , 2014, CIKM.

[26]  Xuanjing Huang,et al.  Convolutional Neural Tensor Network Architecture for Community-Based Question Answering , 2015, IJCAI.

[27]  Lin Sun,et al.  Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities , 2010, ACL.

[28]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[29]  Fang Liu,et al.  Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization , 2013, ACL.

[30]  Christian S. Jensen,et al.  A generalized framework of exploring category information for question retrieval in community question answer archives , 2010, WWW '10.

[31]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[32]  Bo Song,et al.  Modeling knowledge need awareness using the problematic situations elicited from questions and answers , 2015, Knowl. Based Syst..

[33]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[34]  Po Hu,et al.  Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering , 2015, ACL.

[35]  Richard Socher,et al.  A Neural Network for Factoid Question Answering over Paragraphs , 2014, EMNLP.

[36]  Jun Zhao,et al.  Topic-sensitive probabilistic model for expert finding in question answer communities , 2012, CIKM.

[37]  Mihai Surdeanu,et al.  Learning to Rank Answers on Large Online QA Collections , 2008, ACL.

[38]  Deyuan Zhang,et al.  Deep Learning Approaches to Semantic Relevance Modeling for Chinese Question-Answer Pairs , 2011, TALIP.

[39]  Amit Singh Entity based Q&A Retrieval , 2012, EMNLP-CoNLL.

[40]  Jun Zhao,et al.  Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering , 2012, COLING.

[41]  Tat-Seng Chua,et al.  Exploring Key Concept Paraphrasing Based on Pivot Language Translation for Question Retrieval , 2015, AAAI.

[42]  Jung-Tae Lee,et al.  Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models , 2008, EMNLP.

[43]  Li Cai,et al.  Learning the Latent Topics for Question Retrieval in Community QA , 2011, IJCNLP.

[44]  Juan Luis Castro,et al.  A cloud of FAQ: A highly-precise FAQ retrieval system for the Web 2.0 , 2013, Knowl. Based Syst..

[45]  Yong Yu,et al.  Tapping on the potential of q&a community by recommending answer providers , 2008, CIKM '08.

[46]  Stephen E. Robertson,et al.  Okapi at TREC , 1992, TREC.

[47]  John C. Platt,et al.  Learning Discriminative Projections for Text Similarity Measures , 2011, CoNLL.

[48]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..