Selecting Sentences versus Selecting Tree Constituents for Automatic Question Ranking

Community question answering (cQA) websites are focused on users who query questions onto an online forum, expecting for other users to provide them answers or suggestions. Unlike other social media, the length of the posted queries has no limits and queries tend to be multi-sentence elaborations combining context, actual questions, and irrelevant information. We approach the problem of question ranking: given a user’s new question, to retrieve those previously-posted questions which could be equivalent, or highly relevant. This could prevent the posting of nearly-duplicate questions and provide the user with instantaneous answers. For the first time in cQA, we address the selection of relevant text —both at sentence- and at constituent-level— for parse tree-based representations. Our supervised models for text selection boost the performance of a tree kernel-based machine learning model, allowing it to overtake the current state of the art on a recently released cQA evaluation framework.

[1]  Paolo Rosso,et al.  UH-PRHLT at SemEval-2016 Task 3: Combining Lexical and Semantic-based Features for Community Question Answering , 2016, SemEval@NAACL-HLT.

[2]  Michael J. Wise,et al.  YAP3: improved detection of similarities in computer program and other texts , 1996, SIGCSE '96.

[3]  W. Bruce Croft,et al.  A framework to predict the quality of answers with non-textual features , 2006, SIGIR.

[4]  Preslav Nakov,et al.  Global Thread-level Inference for Comment Classification in Community Question Answering , 2015, EMNLP.

[5]  Mihai Surdeanu,et al.  Learning to Rank Answers on Large Online QA Collections , 2008, ACL.

[6]  Alessandro Moschitti,et al.  Making Tree Kernels Practical for Natural Language Learning , 2006, EACL.

[7]  Jun Sun,et al.  Tree Sequence Kernel for Natural Language , 2011, AAAI.

[8]  Yong Yu,et al.  Recommending questions using the mdl-based tree cut model , 2008, WWW.

[9]  Li Cai,et al.  Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives , 2011, ACL.

[10]  Alessandro Sperduti,et al.  Extending Tree Kernels with Topological Information , 2011, ICANN.

[11]  Trevor I. Dix,et al.  A Bit-String Longest-Common-Subsequence Algorithm , 1986, Inf. Process. Lett..

[12]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[13]  Alessandro Moschitti,et al.  Structural relationships for large-scale learning of answer re-ranking , 2012, SIGIR '12.

[14]  Kai Wang,et al.  A syntactic tree matching approach to finding similar questions in community-based qa services , 2009, SIGIR.

[15]  Preslav Nakov,et al.  SemEval-2016 Task 3: Community Question Answering , 2019, *SEMEVAL.

[16]  Yonatan Belinkov,et al.  Neural Attention for Learning to Rank Questions in Community Question Answering , 2016, COLING.

[17]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[18]  Cícero Nogueira dos Santos,et al.  Learning Hybrid Representations to Retrieve Semantically Equivalent Questions , 2015, ACL.

[19]  Po Hu,et al.  Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering , 2015, ACL.

[20]  Ben He,et al.  Question-answer topic model for question retrieval in community question answering , 2012, CIKM.

[21]  Shafiq R. Joty,et al.  ConvKN at SemEval-2016 Task 3: Answer and Question Selection for Question Answering on Arabic and English Fora , 2016, *SEMEVAL.

[22]  Roberto Basili,et al.  KELP: a Kernel-based Learning Platform , 2018, J. Mach. Learn. Res..

[23]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[24]  Alessandro Moschitti,et al.  Structural Representations for Learning Relations between Pairs of Texts , 2015, ACL.

[25]  Quan Hung Tran,et al.  JAIST: Combining multiple features for Answer Selection in Community Question Answering , 2015, *SEMEVAL.

[26]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[27]  Roberto Basili,et al.  KeLP at SemEval-2016 Task 3: Learning Semantic Relations between Questions and Answers , 2016, *SEMEVAL.

[28]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[29]  Preslav Nakov,et al.  QCRI: Answer Selection for Community Question Answering - Experiments for Arabic and English , 2015, *SEMEVAL.

[30]  Alberto Barrón-Cedeño,et al.  Learning to Re-Rank Questions in Community Question Answering Using Advanced Features , 2016, CIKM.

[31]  Yong Yu,et al.  Searching Questions by Identifying Question Topic and Question Focus , 2008, ACL.

[32]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[33]  W. Bruce Croft,et al.  Finding semantically similar questions based on their answers , 2005, SIGIR '05.

[34]  Xiaolong Wang,et al.  HITSZ-ICRC: Exploiting Classification Approach for Answer Selection in Community Question Answering , 2015, *SEMEVAL.

[35]  Preslav Nakov,et al.  SemEval-2015 Task 3: Answer Selection in Community Question Answering , 2015, *SEMEVAL.

[36]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[37]  Zhoujun Li,et al.  Question Retrieval with High Quality Answers in Community Question Answering , 2014, CIKM.

[38]  James A. Malcolm,et al.  Detecting Short Passages of Similar Text in Large Document Collections , 2001, EMNLP.