Unsupervised Solution Post Identification from Discussion Forums

Discussion forums have evolved into a dependable source of knowledge to solve common problems. However, only a minority of the posts in discussion forums are solution posts. Identifying solution posts from discussion forums, hence, is an important research problem. In this paper, we present a technique for unsupervised solution post identification leveraging a so far unexplored textual feature, that of lexical correlations between problems and solutions. We use translation models and language models to exploit lexical correlations and solution post character respectively. Our technique is designed to not rely much on structural features such as post metadata since such features are often not uniformly available across forums. Our clustering-based iterative solution identification approach based on the EM-formulation performs favorably in an empirical evaluation, beating the only unsupervised solution identification technique from literature by a very large margin. We also show that our unsupervised technique is competitive against methods that require supervision, outperforming one such technique comfortably.

[1]  Li Wang,et al.  Tagging and Linking Web Forum Posts , 2010, CoNLL.

[2]  Young-In Song,et al.  Finding question-answer pairs from online forums , 2008, SIGIR '08.

[3]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[4]  Karthik Visweswariah,et al.  Semi-Supervised Answer Extraction from Discussion Forums , 2013, IJCNLP.

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[7]  Yang Liu,et al.  Finding Problem Solving Threads in Online Forum , 2011, IJCNLP.

[8]  Karthik Visweswariah,et al.  Does Similarity Matter? The Case of Answer Extraction from Technical Discussion Forums , 2012, COLING.

[9]  Dinesh Raghu,et al.  Domain adaptive answer extraction for discussion boards , 2012, WWW.

[10]  Amit Singh Entity based Q&A Retrieval , 2012, EMNLP-CoNLL.

[11]  Ian D. Watson,et al.  An Introduction to Case-Based Reasoning , 1995, UK Workshop on Case-Based Reasoning.

[12]  Karthik Visweswariah,et al.  Two-part segmentation of text documents , 2012, CIKM '12.

[13]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[14]  Xiaoyan Zhu,et al.  Using Conditional Random Fields to Extract Contexts and Answers of Questions from Online Forums , 2008, ACL.

[15]  Brian D. Davison,et al.  A classification-based approach to question answering in discussion boards , 2009, SIGIR.

[16]  W. Bruce Croft,et al.  Retrieval models for question and answer archives , 2008, SIGIR '08.

[17]  W. Bruce Croft,et al.  Online community search using thread structure , 2009, CIKM.

[18]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.