Thread Structure Learning on Online Health Forums With Partially Labeled Data

Thread structures, the reply relationships between posts, in online forums are very important for readers to understand the thread content, and for improving the effectiveness of automated forum information retrieval, expert findings, and so on. However, most online forums only have partially labeled structures, which means that some reply relationships are known while the others are unknown. To address this problem, studies have been performed to learn and predict thread structures. However, existing work does not leverage the partially available thread structures to learn the complete thread structure. We have also observed that many online health forums are a type of person-centric forums, where persons are mentioned across posts, providing hints about the reply relationships between posts. In this article, we first proposed to learn the complete thread structures by leveraging the partially known structures based on a statistical machine learning model—thread conditional random fields (threadCRFs). Then, we proposed to use person resolution, the process of identifying the same person mentioned in different contexts, together with threadCRF for thread structure learning. We have empirically verified the effectiveness of the proposed approaches.

[1]  Yi Chen,et al.  Learning thread reply structure on patient forums , 2013, DARE '13.

[2]  Erik Aumayr,et al.  Reconstruction of Threaded Conversations in Online Discussion Forums , 2011, ICWSM.

[3]  Dale Schuurmans,et al.  Semi-Supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling , 2006, ACL.

[4]  W. Bruce Croft,et al.  Online community search using thread structure , 2009, CIKM.

[5]  ChengXiang Zhai,et al.  Exploiting Forum Thread Structures to Improve Thread Clustering , 2013, ICTIR.

[6]  Yunzhong Liu,et al.  Patient-Centered Information Extraction for Effective Search on Healthcare Forum , 2013, SBP.

[7]  Shay B. Cohen,et al.  Conversation Trees: A Grammar Model for Topic Structure in Forums , 2015, EMNLP.

[8]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[9]  Xiaoqiang Luo,et al.  Improving Coreference Resolution by Using Conversational Metadata , 2009, NAACL.

[10]  Chen Lin,et al.  Simultaneously modeling semantics and structure of threaded discussions: a sparse coding approach and its applications , 2009, SIGIR.

[11]  M. Asadpour,et al.  A Supervised Approach to Predict the Hierarchical Structure of Conversation Threads for Comments , 2014, TheScientificWorldJournal.

[12]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[13]  Michael J. Paul Mixed Membership Markov Models for Unsupervised Conversation Modeling , 2012, EMNLP.

[14]  ChengXiang Zhai,et al.  Learning online discussion structures by conditional random fields , 2011, SIGIR.

[15]  ChengXiang Zhai,et al.  Exploiting Thread Structures to Improve Smoothing of Language Models for Forum Post Retrieval , 2011, ECIR.

[16]  Li Wang,et al.  The Utility of Discourse Structure in Forum Thread Retrieval , 2013, AIRS.

[17]  Heeyoung Lee,et al.  Deterministic Coreference Resolution Based on Entity-Centric, Precision-Ranked Rules , 2013, CL.

[18]  Li Wang,et al.  Predicting Thread Discourse Structure over Technical Web Forums , 2011, EMNLP.

[19]  Srinivas Bangalore,et al.  Interaction between dialog structure and coreference resolution , 2010, 2010 IEEE Spoken Language Technology Workshop.

[20]  Iris Hendrickx,et al.  Coreference Resolution on Blogs and Commented News , 2009, DAARC.

[21]  Mark S. Ackerman,et al.  Expertise networks in online communities: structure and algorithms , 2007, WWW '07.

[22]  Yuji Matsumoto,et al.  Training Conditional Random Fields Using Incomplete Annotations , 2008, COLING.

[23]  Heeyoung Lee,et al.  Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task , 2011, CoNLL Shared Task.

[24]  Karel Jezek,et al.  Two uses of anaphora resolution in summarization , 2007, Inf. Process. Manag..