Identifying the role of individual user messages in an online discussion and its use in thread retrieval

Online discussion forums have become a popular medium for users to discuss with and seek information from other users having similar interests. A typical discussion thread consists of a sequence of posts posted by multiple users. Each post in a thread serves a different purpose providing different types of information and, thus, may not be equally useful for all applications. Identifying the purpose and nature of each post in a discussion thread is thus an interesting research problem as it can help in improving information extraction and intelligent assistance techniques. We study the problem of classifying a given post as per its purpose in the discussion thread and employ features based on the post's content, structure of the thread, behavior of the participating users, and sentiment analysis of the post's content. We evaluate our approach on two forum data sets belonging to different genres and achieve strong classification performance. We also analyze the relative importance of different features used for the post classification task. Next, as a use case, we describe how the post class information can help in thread retrieval by incorporating this information in a state‐of‐the‐art thread retrieval model.

[1]  Eugene Agichtein,et al.  CoCQA: Co-Training over Questions and Answers with an Application to Predicting Question Subjectivity Orientation , 2008, EMNLP.

[2]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[3]  Cécile Paris,et al.  The nature of requests and commitments in email messages , 2008, AAAI 2008.

[4]  John Yen,et al.  Co-training over Domain-independent and Domain-dependent features for sentiment analysis of an online cancer support community , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[5]  Johanna D. Moore,et al.  Incorporating Speaker and Discourse Features into Speech Summarization , 2006, NAACL.

[6]  W. Bruce Croft,et al.  Retrieval models for question and answer archives , 2008, SIGIR '08.

[7]  Tom M. Mitchell,et al.  Learning to Classify Email into “Speech Acts” , 2004, EMNLP.

[8]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[9]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[10]  Yida Wang,et al.  Incorporating site-level knowledge to extract structured data from web forums , 2009, WWW '09.

[11]  Wei-Ying Ma,et al.  Building implicit links from content for forum search , 2006, SIGIR.

[12]  Li Wang,et al.  Tagging and Linking Web Forum Posts , 2010, CoNLL.

[13]  E. Hovy,et al.  Mining and Assessing Discussions on the Web through Speech Act Analysis , 2006 .

[14]  Cornelia Caragea,et al.  Thread Specific Features are Helpful for Identifying Subjectivity Orientation of Online Forum Threads , 2012, COLING.

[15]  Dan Feng,et al.  Ranking community answers by modeling question-answer relationships via analogical reasoning , 2009, SIGIR.

[16]  Gary Geunbae Lee,et al.  Semi-supervised Speech Act Recognition in Emails and Forums , 2009, EMNLP.

[17]  Brian D. Davison,et al.  A classification-based approach to question answering in discussion boards , 2009, SIGIR.

[18]  Prasenjit Mitra,et al.  Classifying User Messages For Managing Web Forum Data , 2012, WebDB.

[19]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[20]  Jaime G. Carbonell,et al.  It pays to be picky: an evaluation of thread retrieval in online forums , 2009, SIGIR.

[21]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[22]  Cornelia Caragea,et al.  I want what i need!: analyzing subjectivity of online forum threads , 2012, CIKM.

[23]  Eugene Agichtein,et al.  Finding the right facts in the crowd: factoid question answering over social media , 2008, WWW.

[24]  Young-In Song,et al.  Finding question-answer pairs from online forums , 2008, SIGIR '08.

[25]  W. Bruce Croft,et al.  Online community search using thread structure , 2009, CIKM.

[26]  Chun Chen,et al.  Learning a user-thread alignment manifold for thread recommendation in online forum , 2010, CIKM.

[27]  Cornelia Caragea,et al.  Predicting Subjectivity Orientation of Online Forum Threads , 2013, CICLing.

[28]  Jihie Kim,et al.  Learning to Detect Conversation Focus of Threaded Discussions , 2006, NAACL.

[29]  Liang Zhou,et al.  On the Summarization of Dynamically Introduced Information: Online Discussions and Blogs , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[30]  Prasenjit Mitra,et al.  Adopting Inference Networks for Online Thread Retrieval , 2010, AAAI.

[31]  Chen Lin,et al.  Simultaneously modeling semantics and structure of threaded discussions: a sparse coding approach and its applications , 2009, SIGIR.

[32]  Ming Zhou,et al.  Extracting Chatbot Knowledge from Online Discussion Forums , 2007, IJCAI.

[33]  Chin-Yew Lin,et al.  A Structural Support Vector Method for Extracting Contexts and Answers of Questions from Online Forums , 2009, EMNLP.

[34]  Arvid Kappas,et al.  Sentiment in short strength detection informal text , 2010, J. Assoc. Inf. Sci. Technol..

[35]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[36]  Xiaoyan Zhu,et al.  Using Conditional Random Fields to Extract Contexts and Answers of Questions from Online Forums , 2008, ACL.

[37]  ChengXiang Zhai,et al.  Learning online discussion structures by conditional random fields , 2011, SIGIR.

[38]  ChengXiang Zhai,et al.  Exploiting Thread Structures to Improve Smoothing of Language Models for Forum Post Retrieval , 2011, ECIR.

[39]  Cornelia Caragea,et al.  Using non-lexical features for identifying factual and opinionative threads in online forums , 2014, Knowl. Based Syst..

[40]  Shafiq R. Joty,et al.  Unsupervised Modeling of Dialog Acts in Asynchronous Conversations , 2011, IJCAI.

[41]  Tetsuya Sakai,et al.  Community QA Question Classification: Is the Asker Looking for Subjective Answers or Not? , 2011 .

[42]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.