Feature analysis for web forum question post detection

A web forum which is also known as discussion board or Internet forum is an online community of users with a common interest. It is a problem-solving platform that engages experts across the globe. Both technical and non-technical problems are resolved on a daily basis within web forums. Research activities in this domain have been concentrated on answer detection with the assumption that the initial post of a thread is a question post. The quality of web forum question posts varies from excellent to mediocre or even spam. Detecting good question posts require utilization of salient features. In this paper, we implement a bag-of-words (BoW) model to mine web forum question posts. We empirically address the following questions in the paper. Can BoW model effectively detect web forum question post? What feature selection method is most appropriate for BoW model in this domain? Is choice of classifier influenced by web forum genre? We used three publicly available datasets of varying technical degrees for the experiments. The experimental results revealed that BoW can perform better than complex techniques that implement higher N-gram with part-of-speech tagging.

[1]  Eugene Agichtein,et al.  Learning to recognize reliable users and content in social media with coupled mutual reinforcement , 2009, WWW '09.

[2]  Eugene Agichtein,et al.  CoCQA: Co-Training over Questions and Answers with an Application to Predicting Question Subjectivity Orientation , 2008, EMNLP.

[3]  Tetsuya Sakai,et al.  Community QA Question Classification: Is the Asker Looking for Subjective Answers or Not? , 2011 .

[4]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[5]  Karthik Visweswariah,et al.  Unsupervised Solution Post Identification from Discussion Forums , 2014, ACL.

[6]  Lin Sun,et al.  Extracting Chinese question-answer pairs from online forums , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[7]  E. Hovy,et al.  Mining and Assessing Discussions on the Web through Speech Act Analysis , 2006 .

[8]  E. Hovy,et al.  Modeling and Assessing Student Activities in On-Line Discussions , 2006 .

[9]  Young-In Song,et al.  Finding question-answer pairs from online forums , 2008, SIGIR '08.

[10]  Prasenjit Mitra,et al.  Classifying User Messages For Managing Web Forum Data , 2012, WebDB.

[11]  K. Srinathan,et al.  Unsupervised deep semantic and logical analysis for identification of solution posts from community answers , 2016, Int. J. Inf. Decis. Sci..

[12]  Prasenjit Mitra,et al.  Identifying the role of individual user messages in an online discussion and its use in thread retrieval , 2016, J. Assoc. Inf. Sci. Technol..

[13]  Cornelia Caragea,et al.  Using non-lexical features for identifying factual and opinionative threads in online forums , 2014, Knowl. Based Syst..

[14]  Brian D. Davison,et al.  A classification-based approach to question answering in discussion boards , 2009, SIGIR.

[15]  Karthik Visweswariah,et al.  Does Similarity Matter? The Case of Answer Extraction from Technical Discussion Forums , 2012, COLING.

[16]  Lin Sun,et al.  A study of features on Primary Question detection in Chinese online forums , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.