A Supervised Approach to Predict the Hierarchical Structure of Conversation Threads for Comments

User-generated texts such as comments in social media are rich sources of information. In general, the reply structure of comments is not publicly accessible on the web. Websites present comments as a list in chronological order. This way, some information is lost. A solution for this problem is to reconstruct the thread structure (RTS) automatically. RTS predicts a semantic tree for the reply structure, useful for understanding users' behaviours and facilitating follow of the actual conversation streams. This paper works on RTS task in blogs, online news agencies, and news websites. These types of websites cover various types of articles reflecting the real-world events. People with different views participate in arguments by writing comments. Comments express opinions, sentiments, or ideas about articles. The reply structure of threads in these types of websites is basically different from threads in the forums, chats, and emails. To perform RTS, we define a set of textual and nontextual features. Then, we use supervised learning to combine these features. The proposed method is evaluated on five different datasets. The accuracy of the proposed method is compared with baselines. The results reveal higher accuracy for our method in comparison with baselines in all datasets.

[1]  Xiang Li,et al.  Learning Hierarchical Relationships among Partially Ordered Objects with Heterogeneous Attributes and Links , 2012, SDM.

[2]  Fanghuai Hu,et al.  Complete-Thread Extraction from Web Forums , 2012, APWeb.

[3]  Maarten de Rijke,et al.  Extracting the discussion structure in comments on news-articles , 2007, WIDM '07.

[4]  P. Kirkpatrick,et al.  X-ray Microscopy and X-ray Microanalysis , 1961 .

[5]  Azadeh Shakery,et al.  A learning approach for email conversation thread reconstruction , 2013, J. Inf. Sci..

[6]  Gilad Mishne,et al.  Leave a Reply: An Analysis of Weblog Comments , 2006 .

[7]  W. Bruce Croft,et al.  Online community search using conversational structures , 2011, Information Retrieval.

[8]  Yana Volkovich,et al.  When the Wikipedians Talk: Network and Tree Structure of Wikipedia Discussion Pages , 2011, ICWSM.

[9]  Li Wang,et al.  Predicting Thread Discourse Structure over Technical Web Forums , 2011, EMNLP.

[10]  Elizabeth M. Daly,et al.  Decomposing Discussion Forums and Boards Using User Roles , 2010, ICWSM.

[11]  Qiang Yang,et al.  Thread detection in dynamic text message streams , 2006, SIGIR.

[12]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[13]  Masoud Asadpour,et al.  Content diffusion prediction in social networks , 2013, The 5th Conference on Information and Knowledge Technology.

[14]  Jun Ding,et al.  Automatic Web Information Extraction Based on Rules , 2011, WISE.

[15]  G. A. Barnard,et al.  Transmission of Information: A Statistical Theory of Communications. , 1961 .

[16]  Jacob Andreas,et al.  Annotating Agreement and Disagreement in Threaded Discussion , 2012, LREC.

[17]  Erik Aumayr,et al.  Reconstruction of Threaded Conversations in Online Discussion Forums , 2011, ICWSM.

[18]  Craig H. Martell,et al.  Topic Detection and Extraction in Chat , 2008, 2008 IEEE International Conference on Semantic Computing.

[19]  Jihie Kim,et al.  An intelligent discussion-bot for answering student queries in threaded discussions , 2006, IUI '06.

[20]  Jihie Kim,et al.  Towards Modeling Threaded Discussions using Induced Ontology Knowledge , 2006, AAAI.

[21]  Azadeh Shakery,et al.  An Evolutionary-Based Method for Reconstructing Conversation Threads in Email Corpora , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[22]  David Lo,et al.  Finding relevant answers in software forums , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[23]  Y. Ioannidis,et al.  Extracting Topics of Debate between Users on Web Discussion Boards , 2010 .

[24]  Li Wang,et al.  Tagging and Linking Web Forum Posts , 2010, CoNLL.

[25]  Jiawei Han,et al.  An exploration of discussion threads in social news sites: A case study of the Reddit community , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[26]  Jen-Yuan Yeh,et al.  Email Thread Reassembly Using Similarity Matching , 2006, CEAS.

[27]  Chen Lin,et al.  Modeling semantics and structure of discussion threads , 2009, WWW '09.

[28]  Carolyn Penstein Rosé,et al.  Recovering Implicit Thread Structure in Newsgroup Style Conversations , 2021, ICWSM.

[29]  Eric N. Forsyth Improving automated lexical and discourse analysis of online chat dialog , 2007 .

[30]  John Skvoretz,et al.  Node centrality in weighted networks: Generalizing degree and shortest paths , 2010, Soc. Networks.

[31]  A. Grabowski,et al.  Human behavior in online social systems , 2009 .

[32]  ChengXiang Zhai,et al.  Learning online discussion structures by conditional random fields , 2011, SIGIR.

[33]  ChengXiang Zhai,et al.  Exploiting Thread Structures to Improve Smoothing of Language Models for Forum Post Retrieval , 2011, ECIR.

[34]  Paige Adams,et al.  Conversation Thread Extraction and Topic Detection in Text-Based Chat , 2010 .

[35]  Carman Neustaedter,et al.  Understanding sequence and reply relationships within email conversations: a mixed-model visualization , 2003, CHI '03.

[36]  Heshaam Faili,et al.  A Supervised Approach for Reconstructing Thread Structure in Comments on Blogs and Online News Agencies (El enfoque supervisado para reconstrucción de la estructura de hilos en comentarios en blogs y agencias de noticias en línea) , 2013, Computación y Sistemas.

[37]  Vicenç Gómez,et al.  Statistical analysis of the social network and discussion threads in slashdot , 2008, WWW.

[38]  Carolyn Penstein Rosé,et al.  Making Conversational Structure Explicit: Identification of Initiation-response Pairs within Online Discussions , 2010, NAACL.

[39]  Lixiang Li,et al.  Power-Law Properties of Human View and Reply Behavior in Online Society , 2012 .

[40]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.