Thread-level Analysis over Technical User Forum Data

This research focuses on improving information access over troubleshootingoriented technical user forums via threadlevel analysis. We describe a modular task formulation and novel dataset, and go on to describe a series of preliminary classification experiments over the data. We find that a class composition strategy achieves the best results, surpassing multiclass classification approaches.

[1]  Sophia Ananiadou,et al.  Developing a Robust Part-of-Speech Tagger for Biomedical Text , 2005, Panhellenic Conference on Informatics.

[2]  Timothy Baldwin,et al.  Automatic Thread Classification for Linux User Forum Information Access , 2007 .

[3]  Yoram Singer,et al.  Large margin hierarchical classification , 2004, ICML.

[4]  Stephen Wan,et al.  Generating Overview Summaries of Ongoing Email Thread Discussions , 2004, COLING.

[5]  Owen Rambow,et al.  Summarizing Email Threads , 2004, NAACL.

[6]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[7]  Ani Nenkova,et al.  Facilitating email thread access by extractive summary generation , 2003, RANLP.

[8]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[9]  Liang Zhou,et al.  Digesting Virtual "Geek" Culture: The Summarization of Technical Internet Relay Chats , 2005, ACL.

[10]  Yiming Yang,et al.  Text categorization , 2008, Scholarpedia.

[11]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[12]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[13]  Alexander S. Yeh,et al.  More accurate tests for the statistical significance of result differences , 2000, COLING.

[14]  Timothy Baldwin,et al.  Intelligent Linux Information Access by Data Mining: the ILIAD Project , 2010, HLT-NAACL 2010.