Hybridization of Bag-of-Words and Forum Metadata for Web Forum Question Post Detection

Background/Objective:A web forum is a problem-solving online community.Web forum research activitieshave been focused on answer mining with the assumption that the starting post is a question post. This paper proposes methods for mining standard web forum questions. Methods/Statistical Analysis:Popular methods for web forum question post detection are question mark, question words, higher n-grams and sequential pattern mining. These methods have problem of low detection rate and implementation complexity. Implemented in this paper is hybridization of simple bag-of-words model with web forum metadata, simple rule of question mark and question words. Dimensional reduction was performed using chi-square and wrapper techniques. Findings:The quality of web forum question posts varies from excellent to mediocre or even spam. Detecting good question posts is non-trivial. It requires utilization of salient features. Combination of simple rule of question mark and question words with forum metadata performed better than each of the two.Integration of bag-of-words model with simple rule of question marks, question words and forum metadata enhances question post detection. Dimensionality reduction using chi-square were found to perform better than other popular filters like info gain, gain ratio and symmetric uncertain. Applications/Improvements: Three publicly available datasets of varying technical degrees were used for the experiments.The experimental results revealed that an enhanced bag-of-words model can perform better than complex techniques that implement higher N-gram with part-of-speech tagging.

[1]  K. Srinathan,et al.  Unsupervised deep semantic and logical analysis for identification of solution posts from community answers , 2016, Int. J. Inf. Decis. Sci..

[2]  Lin Sun,et al.  Extracting Chinese question-answer pairs from online forums , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[3]  E. Hovy,et al.  Mining and Assessing Discussions on the Web through Speech Act Analysis , 2006 .

[4]  Eugene Agichtein,et al.  CoCQA: Co-Training over Questions and Answers with an Application to Predicting Question Subjectivity Orientation , 2008, EMNLP.

[5]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[6]  Prasenjit Mitra,et al.  Classifying User Messages For Managing Web Forum Data , 2012, WebDB.

[7]  Karthik Visweswariah,et al.  Does Similarity Matter? The Case of Answer Extraction from Technical Discussion Forums , 2012, COLING.

[8]  Lin Sun,et al.  A study of features on Primary Question detection in Chinese online forums , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[9]  Neil Yorke-Smith,et al.  Detection of Imperative and Declarative Question-Answer Pairs in Email Conversations , 2009, IJCAI.

[10]  Brian D. Davison,et al.  A classification-based approach to question answering in discussion boards , 2009, SIGIR.

[11]  E. Hovy,et al.  Modeling and Assessing Student Activities in On-Line Discussions , 2006 .

[12]  Tetsuya Sakai,et al.  Community QA Question Classification: Is the Asker Looking for Subjective Answers or Not? , 2011 .

[13]  Young-In Song,et al.  Finding question-answer pairs from online forums , 2008, SIGIR '08.

[14]  John Atkinson,et al.  Evolutionary optimization for ranking how-to questions based on user-generated contents , 2013, Expert Syst. Appl..

[15]  Prasenjit Mitra,et al.  Identifying the role of individual user messages in an online discussion and its use in thread retrieval , 2016, J. Assoc. Inf. Sci. Technol..

[16]  Cornelia Caragea,et al.  Using non-lexical features for identifying factual and opinionative threads in online forums , 2014, Knowl. Based Syst..

[17]  Karthik Visweswariah,et al.  Unsupervised Solution Post Identification from Discussion Forums , 2014, ACL.