Global Thread-level Inference for Comment Classification in Community Question Answering

Community question answering, a recent evolution of question answering in the Web context, allows a user to quickly consult the opinion of a number of people on a particular topic, thus taking advantage of the wisdom of the crowd. Here we try to help the user by deciding automatically which answers are good and which are bad for a given question. In particular, we focus on exploiting the output structure at the thread level in order to make more consistent global decisions. More specifically, we exploit the relations between pairs of comments at any distance in the thread, which we incorporate in a graph-cut and in an ILP frameworks. We evaluated our approach on the benchmark dataset of SemEval-2015 Task 3. Results improved over the state of the art, confirming the importance of using thread level information.

[1]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[2]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[3]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[4]  James A. Malcolm,et al.  Detecting Short Passages of Similar Text in Large Document Collections , 2001, EMNLP.

[5]  Preslav Nakov,et al.  Thread-Level Information for Comment Classification in Community Question Answering , 2015, ACL.

[6]  Trevor I. Dix,et al.  A Bit-String Longest-Common-Subsequence Algorithm , 1986, Inf. Process. Lett..

[7]  Aravind K. Joshi,et al.  An SVM-based voting algorithm with application to parse reranking , 2003, CoNLL.

[8]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[9]  Preslav Nakov,et al.  QCRI: Answer Selection for Community Question Answering - Experiments for Arabic and English , 2015, *SEMEVAL.

[10]  Iryna Gurevych,et al.  UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures , 2012, *SEMEVAL.

[11]  Michael J. Wise,et al.  YAP3: improved detection of similarities in computer program and other texts , 1996, SIGCSE '96.

[12]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Dan Roth,et al.  A Linear Programming Formulation for Global Inference in Natural Language Tasks , 2004, CoNLL.

[14]  Preslav Nakov,et al.  SemEval-2015 Task 3: Answer Selection in Community Question Answering , 2015, *SEMEVAL.

[15]  Xiaolong Wang,et al.  HITSZ-ICRC: Exploiting Classification Approach for Answer Selection in Community Question Answering , 2015, *SEMEVAL.

[16]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[17]  Yang Liu,et al.  Finding Problem Solving Threads in Online Forum , 2011, IJCNLP.

[18]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[19]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[20]  Xiaolong Wang,et al.  Answer Sequence Learning with Neural Networks for Answer Selection in Community Question Answering , 2015, ACL.

[21]  Yang Xiang,et al.  ICRC-HIT: A Deep Learning based Comment Sequence Labeling System for Answer Selection Challenge , 2015, *SEMEVAL.