Word Embedding Based Correlation Model for Question/Answer Matching

With the development of community based question answering (Q&A) services, a large scale of Q&A archives have been accumulated and are an important information and knowledge resource on the web. Question and answer matching has been attached much importance to for its ability to reuse knowledge stored in these systems: it can be useful in enhancing user experience with recurrent questions. In this paper, we try to improve the matching accuracy by overcoming the lexical gap between question and answer pairs. A Word Embedding based Correlation (WEC) model is proposed by integrating advantages of both the translation model and word embedding, given a random pair of words, WEC can score their co-occurrence probability in Q&A pairs and it can also leverage the continuity and smoothness of continuous space word representation to deal with new pairs of words that are rare in the training parallel text. An experimental study on Yahoo! Answers dataset and Baidu Zhidao dataset shows this new method's promising potential.

[1]  Xuanjing Huang,et al.  Convolutional Neural Tensor Network Architecture for Community-Based Question Answering , 2015, IJCAI.

[2]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[3]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4]  Po Hu,et al.  Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering , 2015, ACL.

[5]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[6]  Tao Wang,et al.  Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[7]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[8]  Vibhu O. Mittal,et al.  Bridging the lexical chasm: statistical approaches to answer-finding , 2000, SIGIR '00.

[9]  Jimmy J. Lin,et al.  Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks , 2015, EMNLP.

[10]  Michael R. Lyu,et al.  A classification-based approach to question routing in community question answering , 2012, WWW.

[11]  Ben He,et al.  Question-answer topic model for question retrieval in community question answering , 2012, CIKM.

[12]  Yi Liu,et al.  Statistical Machine Translation for Query Expansion in Answer Retrieval , 2007, ACL.

[13]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[14]  Hang Li,et al.  A Deep Architecture for Matching Short Texts , 2013, NIPS.

[15]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[16]  David Carmel,et al.  eResponder: Electronic Question Responder , 2000, CoopIS.

[17]  Joemon M. Jose,et al.  A Semantic Graph based Topic Model for Question Retrieval in Community Question Answering , 2016, WSDM.

[18]  Peter Kulchyski and , 2015 .

[19]  Preslav Nakov,et al.  Machine Translation Evaluation Meets Community Question Answering , 2016, ACL.

[20]  Christian S. Jensen,et al.  A generalized framework of exploring category information for question retrieval in community question answer archives , 2010, WWW '10.

[21]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[22]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR Forum.

[23]  Idan Szpektor,et al.  Learning from the past: answering new questions with past answers , 2012, WWW.

[24]  Zhiwei Sun,et al.  Question/Answer Matching for CQA System via Combining Lexical and Sequential Information , 2015, AAAI.

[25]  Preslav Nakov,et al.  SemEval-2016 Task 3: Community Question Answering , 2019, *SEMEVAL.

[26]  Christopher Meek,et al.  Semantic Parsing for Single-Relation Question Answering , 2014, ACL.

[27]  W. Bruce Croft,et al.  Retrieval models for question and answer archives , 2008, SIGIR '08.

[28]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[29]  Mihai Surdeanu,et al.  Learning to Rank Answers on Large Online QA Collections , 2008, ACL.

[30]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[31]  JärvelinKalervo,et al.  IR evaluation methods for retrieving highly relevant documents , 2017 .

[32]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[33]  Preslav Nakov,et al.  SemEval-2015 Task 3: Answer Selection in Community Question Answering , 2015, *SEMEVAL.

[34]  Günter Neumann,et al.  Category-specific models for ranking effective paraphrases in community Question Answering , 2014, Expert Syst. Appl..

[35]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[36]  Lada A. Adamic,et al.  Knowledge sharing and yahoo answers: everyone knows something , 2008, WWW.

[37]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[38]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[39]  Li Cai,et al.  Learning the Latent Topics for Question Retrieval in Community QA , 2011, IJCNLP.