Learning to Re-Rank Questions in Community Question Answering Using Advanced Features

We study the impact of different types of features for question ranking in community Question Answering: bag-of-words models (BoW), syntactic tree kernels (TKs) and rank features. It should be noted that structural kernels have never been applied to the question reranking task, i.e., question to question similarity, where they have to model paraphrase relations. Additionally, the informal text, typically present in forums, poses new challenges to the use of TKs. We compare our learning to rank (L2R) algorithms against a strong baseline given by the Google rank (GR). The results show that (i) our shallow structures used in TKs are robust enough to noisy data and (ii) improving GR requires effective BoW features and TKs along with an accurate model of GR features in the used L2R algorithm.

[1]  Li Cai,et al.  Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives , 2011, ACL.

[2]  Alessandro Moschitti,et al.  Assessing the Impact of Syntactic and Semantic Structures for Answer Passages Reranking , 2015, CIKM.

[3]  Michael J. Wise,et al.  YAP3: improved detection of similarities in computer program and other texts , 1996, SIGCSE '96.

[4]  Paolo Rosso,et al.  UH-PRHLT at SemEval-2016 Task 3: Combining Lexical and Semantic-based Features for Community Question Answering , 2016, SemEval@NAACL-HLT.

[5]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[6]  Christian S. Jensen,et al.  The use of categorization information in language models for question retrieval , 2009, CIKM.

[7]  Preslav Nakov,et al.  SemEval-2016 Task 3: Community Question Answering , 2019, *SEMEVAL.

[8]  Yong Yu,et al.  Searching Questions by Identifying Question Topic and Question Focus , 2008, ACL.

[9]  Shafiq R. Joty,et al.  ConvKN at SemEval-2016 Task 3: Answer and Question Selection for Question Answering on Arabic and English Fora , 2016, *SEMEVAL.

[10]  Alessandro Moschitti,et al.  Structural relationships for large-scale learning of answer re-ranking , 2012, SIGIR '12.

[11]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[12]  Trevor I. Dix,et al.  A Bit-String Longest-Common-Subsequence Algorithm , 1986, Inf. Process. Lett..

[13]  Kai Wang,et al.  A syntactic tree matching approach to finding similar questions in community-based qa services , 2009, SIGIR.

[14]  James A. Malcolm,et al.  Detecting Short Passages of Similar Text in Large Document Collections , 2001, EMNLP.

[15]  Roberto Basili,et al.  KeLP at SemEval-2016 Task 3: Learning Semantic Relations between Questions and Answers , 2016, *SEMEVAL.