Manhattan Siamese LSTM for Question Retrieval in Community Question Answering

Community Question Answering (cQA) are platforms where users can post their questions, expecting for other users to provide them with answers. We focus on the task of question retrieval in cQA which aims to retrieve previous questions that are similar to new queries. The past answers related to the similar questions can be therefore used to respond to the new queries. The major challenges in this task are the shortness of the questions and the word mismatch problem as users can formulate the same query using different wording. Although question retrieval has been widely studied over the years, it has received less attention in Arabic and still requires a non trivial endeavour. In this paper, we focus on this task both in Arabic and English. We propose to use word embeddings, which can capture semantic and syntactic information from contexts, to vectorize the questions. In order to get longer sequences, questions are expanded with words having close word vectors. The embedding vectors are fed into the Siamese LSTM model to consider the global context of questions. The similarity between the questions is measured using the Manhattan distance. Experiments on real world Yahoo! Answers dataset show the efficiency of the method in Arabic and English.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Fang Liu,et al.  Improving Question Retrieval in Community Question Answering Using World Knowledge , 2013, IJCAI.

[3]  Jun Zhao,et al.  Towards faster and better retrieval models for question search , 2013, CIKM.

[4]  Zhoujun Li,et al.  Question Retrieval with High Quality Answers in Community Question Answering , 2014, CIKM.

[5]  Tat-Seng Chua,et al.  Capturing the Semantics of Key Phrases Using Multiple Languages for Question Retrieval , 2016, IEEE Transactions on Knowledge and Data Engineering.

[6]  Xuanjing Huang,et al.  Latent Semantic Tensor Indexing for Community-based Question Answering , 2013, ACL.

[7]  Yonatan Belinkov,et al.  Neural Attention for Learning to Rank Questions in Community Question Answering , 2016, COLING.

[8]  Kamel Smaïli,et al.  Enhancing Question Retrieval in Community Question Answering Using Word Embeddings , 2019, KES.

[9]  Yonatan Belinkov,et al.  SLS at SemEval-2016 Task 3: Neural-based Approaches for Ranking in Community Question Answering , 2016, *SEMEVAL.

[10]  Li Cai,et al.  Learning the Latent Topics for Question Retrieval in Community QA , 2011, IJCNLP.

[11]  Yong Yu,et al.  Searching Questions by Identifying Question Topic and Question Focus , 2008, ACL.

[12]  Li Cai,et al.  Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives , 2011, ACL.

[13]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[14]  Manoj Chinnakotla,et al.  Siamese LSTM with Convolutional Similarity for Similar Question Retrieval , 2018, 2018 International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP).

[15]  Ming Li,et al.  Learning Question Similarity with Recurrent Neural Networks , 2017, 2017 IEEE International Conference on Big Knowledge (ICBK).

[16]  Po Hu,et al.  Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering , 2015, ACL.

[17]  Christian S. Jensen,et al.  A generalized framework of exploring category information for question retrieval in community question answer archives , 2010, WWW '10.

[18]  Amit Singh Entity based Q&A Retrieval , 2012, EMNLP-CoNLL.

[19]  Christian S. Jensen,et al.  The use of categorization information in language models for question retrieval , 2009, CIKM.

[20]  Yonatan Belinkov,et al.  Language processing and learning models for community question answering in Arabic , 2017, Inf. Process. Manag..

[21]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[22]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[23]  Kai Wang,et al.  A syntactic tree matching approach to finding similar questions in community-based qa services , 2009, SIGIR.

[24]  Tamer Elsayed,et al.  QU-IR at SemEval 2016 Task 3: Learning to Rank on Arabic Community Question Answering Forums with Word Embedding , 2016, *SEMEVAL.

[25]  Alberto Barrón-Cedeño,et al.  Selecting Sentences versus Selecting Tree Constituents for Automatic Question Ranking , 2016, COLING.

[26]  W. Bruce Croft,et al.  Retrieval models for question and answer archives , 2008, SIGIR '08.

[27]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[28]  Cícero Nogueira dos Santos,et al.  Learning Hybrid Representations to Retrieve Semantically Equivalent Questions , 2015, ACL.

[29]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.