A Supervised Learning Approach using the Combination of Semantic and Lexical Features for Arabic Community Question Answering

In this paper we address the problem of Community Question Answering (CQA) for Arabic language. We mainly explore the direction of combining both lexical and semantic features for enhancing the retrieval task of possible answers to a posted question. We show a comprehensive evaluation on the SEMEval2017 CQA dataset for Arabic language. Our Mean Average Precision (MAP) achieves 62.85% when using a supervised machine learning approach (linear SVM). We outperformed best reported results on such dataset. This is achieved by defining a mix of word embedding, latent semantic similarity features and other lexical similarity features.

[1]  Weiwei Guo,et al.  Modeling Sentences in the Latent Space , 2012, ACL.

[2]  Mona T. Diab,et al.  GW_QA at SemEval-2017 Task 3: Question Answer Re-ranking on Arabic Fora , 2017, *SEMEVAL.

[3]  Marwan Torki,et al.  A Document Descriptor using Covariance of Word Vectors , 2018, ACL.

[4]  Noah A. Smith,et al.  What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[5]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[6]  Lei Yu,et al.  Deep Learning for Answer Sentence Selection , 2014, ArXiv.

[7]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8]  Bowen Zhou,et al.  Attentive Pooling Networks , 2016, ArXiv.

[9]  Chris Callison-Burch,et al.  Answer Extraction as Sequence Tagging with Tree Edit Distance , 2013, NAACL.

[10]  Nizar Habash,et al.  MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic , 2014, LREC.

[11]  Tamer Elsayed,et al.  QU-IR at SemEval 2016 Task 3: Learning to Rank on Arabic Community Question Answering Forums with Word Embedding , 2016, *SEMEVAL.

[12]  Donna K. Harman,et al.  Overview of the TREC 2015 LiveQA Track , 2015, TREC.

[13]  Christopher D. Manning,et al.  Probabilistic Tree-Edit Models with Structured Latent Variables for Textual Entailment and Question Answering , 2010, COLING.

[14]  W. Bruce Croft,et al.  Harnessing Semantics for Answer Sentence Retrieval , 2015, ESAIR@CIKM.

[15]  Ming-Wei Chang,et al.  Question Answering Using Enhanced Lexical Semantic Models , 2013, ACL.

[16]  Tamer Elsayed,et al.  QU-BIGIR at SemEval 2017 Task 3: Using Similarity Features for Arabic Community Question Answering Forums , 2017, SemEval@ACL.

[17]  Bowen Zhou,et al.  LSTM-based Deep Learning Models for non-factoid answer selection , 2015, ArXiv.

[18]  Tapas Kanungo,et al.  Machine Learned Sentence Selection Strategies for Query-Biased Summarization , 2008 .

[19]  Zhiguo Wang,et al.  FAQ-based Question Answering via Word Alignment , 2015, ArXiv.

[20]  Alessandro Moschitti,et al.  Automatic Feature Engineering for Answer Selection and Extraction , 2013, EMNLP.

[21]  Noah A. Smith,et al.  Tree Edit Models for Recognizing Textual Entailments, Paraphrases, and Answers to Questions , 2010, NAACL.

[22]  Tamer Elsayed,et al.  Real, Live, and Concise: Answering Open-Domain Questions with Word Embedding and Summarization , 2016, TREC.

[23]  Alessandro Moschitti,et al.  Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks , 2015, SIGIR.

[24]  Matthijs Douze,et al.  FastText.zip: Compressing text classification models , 2016, ArXiv.

[25]  Nizar Habash,et al.  SPLIT: Smart Preprocessing (Quasi) Language Independent Tool , 2016, LREC.

[26]  W. Bruce Croft,et al.  Beyond Factoid QA: Effective Methods for Non-factoid Answer Sentence Retrieval , 2016, ECIR.

[27]  Licia Capra,et al.  Community Question Answering Platforms vs. Twitter for Predicting Characteristics of Urban Neighbourhoods , 2017, ArXiv.

[28]  Di Wang,et al.  A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering , 2015, ACL.