Does Similarity Matter? The Case of Answer Extraction from Technical Discussion Forums

Extracting question‐answer pairs from social media discussions has garnered much attention in recent times. Several methods have been proposed in the past that pose this task as a post or sentence classification problem, which label each entry as an answer or not. This paper makes the first attempt at the following two‐fold objectives: (a) In all classification based approaches towards this direction, one of the foremost signals used to identify answers is their similarity to the question. We study the contribution of content similarity specifically in the context of technical problem‐solving domain. (b) We introduce hitherto unexplored features that aid in high‐precision extraction of answers, and present a thorough study of the contribution of all features to this task. Our results show that, it is possible to extract answers using these features with high accuracy, when their similarity to the question is unreliable.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Xiaoyan Zhu,et al.  Using Conditional Random Fields to Extract Contexts and Answers of Questions from Online Forums , 2008, ACL.

[3]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[4]  Chaomei Chen,et al.  Mining the Web: Discovering knowledge from hypertext data , 2004, J. Assoc. Inf. Sci. Technol..

[5]  W. Bruce Croft,et al.  Online community search using thread structure , 2009, CIKM.

[6]  Young-In Song,et al.  Finding question-answer pairs from online forums , 2008, SIGIR '08.

[7]  Kathleen McKeown,et al.  Detection of Question-Answer Pairs in Email Conversations , 2004, COLING.

[8]  Surithong Srisa‐ard,et al.  Mining the Web: Discovering Knowledge from Hypertext Data , 2003 .

[9]  P. Good,et al.  Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .

[10]  Brian D. Davison,et al.  A classification-based approach to question answering in discussion boards , 2009, SIGIR.

[11]  Dragomir R. Radev,et al.  Using Random Walks for Question-focused Sentence Retrieval , 2005, HLT.

[12]  Ming Zhou,et al.  Extracting Chatbot Knowledge from Online Discussion Forums , 2007, IJCAI.

[13]  Chin-Yew Lin,et al.  A Structural Support Vector Method for Extracting Contexts and Answers of Questions from Online Forums , 2009, EMNLP.

[14]  Fernando Diaz,et al.  Classification-based resource selection , 2009, CIKM.

[15]  Li Wang,et al.  Tagging and Linking Web Forum Posts , 2010, CoNLL.