Tagging and Linking Web Forum Posts

We propose a method for annotating post-to-post discourse structure in online user forum data, in the hopes of improving troubleshooting-oriented information access. We introduce the tasks of: (1) post classification, based on a novel dialogue act tag set; and (2) link classification. We also introduce three feature sets (structural features, post context features and semantic features) and experiment with three discriminative learners (maximum entropy, SVM-HMM and CRF). We achieve above-baseline results for both dialogue act and link classification, with interesting divergences in which feature sets perform well over the two sub-tasks, and go on to perform preliminary investigation of the interaction between post tagging and linking.

[1]  Alexander S. Yeh,et al.  More accurate tests for the statistical significance of result differences , 2000, COLING.

[2]  Timothy Baldwin,et al.  You Are What You Post : User-level Features in Threaded Discourse , 2009 .

[3]  Hai Zhao,et al.  An Improved Chinese Word Segmentation System with Conditional Random Field , 2006, SIGHAN@COLING/ACL.

[4]  Cécile Paris,et al.  The nature of requests and commitments in email messages , 2008, AAAI 2008.

[5]  Carolyn Penstein Rosé,et al.  A Feature Based Approach to Leveraging Context for Classifying Newsgroup Style Discussion Segments , 2007, ACL.

[6]  Xiaoyan Zhu,et al.  Using Conditional Random Fields to Extract Contexts and Answers of Questions from Online Forums , 2008, ACL.

[7]  Edward Gibson,et al.  Representing Discourse Coherence: A Corpus-Based Study , 2005, CL.

[8]  William W. Cohen,et al.  On the collective classification of email "speech acts" , 2005, SIGIR '05.

[9]  Jaime G. Carbonell,et al.  Retrieval and feedback models for blog feed search , 2008, SIGIR '08.

[10]  John A. Carroll,et al.  Applied morphological processing of English , 2001, Natural Language Engineering.

[11]  Maarten de Rijke,et al.  Extracting the discussion structure in comments on news-articles , 2007, WIDM '07.

[12]  Ken Samuel,et al.  Dialogue Act Tagging with Transformation-Based Learning , 1998, ACL.

[13]  Elizabeth Shriberg,et al.  Automatic dialog act segmentation and classification in multiparty meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[14]  Christian S. Jensen,et al.  The use of categorization information in language models for question retrieval , 2009, CIKM.

[15]  JurafskyDaniel,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000 .

[16]  Chen Lin,et al.  Modeling semantics and structure of discussion threads , 2009, WWW '09.

[17]  Johanna D. Moore,et al.  Incorporating Speaker and Discourse Features into Speech Summarization , 2006, NAACL.

[18]  Barbara Di Eugenio,et al.  FLSA: Extending Latent Semantic Analysis with Features for Dialogue Act Classification , 2004, ACL.

[19]  Eric Brill,et al.  Learning effective ranking functions for newsgroup search , 2004, SIGIR '04.

[20]  Carolyn Penstein Rosé,et al.  Recovering Implicit Thread Structure in Newsgroup Style Conversations , 2021, ICWSM.

[21]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[22]  Tom M. Mitchell,et al.  Learning to Classify Email into “Speech Acts” , 2004, EMNLP.

[23]  W. Bruce Croft,et al.  Online community search using thread structure , 2009, CIKM.

[24]  Micha Elsner,et al.  You Talking to Me? A Corpus and Algorithm for Conversation Disentanglement , 2008, ACL.

[25]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[26]  Christos Faloutsos,et al.  Modeling Blog Dynamics , 2009, ICWSM.

[27]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[28]  Jaime G. Carbonell,et al.  It pays to be picky: an evaluation of thread retrieval in online forums , 2009, SIGIR.

[29]  Timothy Baldwin,et al.  Intelligent Linux Information Access by Data Mining: the ILIAD Project , 2010, HLT-NAACL 2010.

[30]  Klaus Ries,et al.  HMM and neural network based speech act detection , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[31]  Max Mühlhäuser,et al.  Automatically Assessing the Post Quality in Online Discussions on Software , 2007, ACL.

[32]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[33]  Young-In Song,et al.  Finding question-answer pairs from online forums , 2008, SIGIR '08.

[34]  Kathleen McKeown,et al.  Detection of Question-Answer Pairs in Email Conversations , 2004, COLING.

[35]  Jeff A. Bilmes,et al.  Dialog act tagging using graphical models , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[36]  Grzegorz Kondrak,et al.  On the Syllabification of Phonemes , 2009, NAACL.

[37]  Carolyn Penstein Rosé,et al.  Discourse Processing of Dialogues with Multiple Threads , 1995, ACL.

[38]  Elizabeth Shriberg,et al.  The ICSI Meeting Recorder Dialog Act (MRDA) Corpus , 2004, SIGDIAL Workshop.

[39]  Edward Ivanovic,et al.  Automatic instant messaging dialogue using statistical models and dialogue acts , 2008 .

[40]  G. Meade Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001 .

[41]  Stanley Peters,et al.  Collaborative activities and multi-tasking in dialogue systems , 2002 .