Semi-supervised Speech Act Recognition in Emails and Forums

In this paper, we present a semi-supervised method for automatic speech act recognition in email and forums. The major challenge of this task is due to lack of labeled data in these two genres. Our method leverages labeled data in the Switchboard-DAMSL and the Meeting Recorder Dialog Act database and applies simple domain adaptation techniques over a large amount of unlabeled email and forum data to address this problem. Our method uses automatically extracted features such as phrases and dependency trees, called subtree features, for semi-supervised learning. Empirical results demonstrate that our model is effective in email and forum speech act recognition.

[1]  Daniel Marcu,et al.  Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.

[2]  Gideon S. Mann,et al.  Learning from labeled features using generalized expectation criteria , 2008, SIGIR '08.

[3]  Jihie Kim,et al.  Learning to Detect Conversation Focus of Threaded Discussions , 2006, NAACL.

[4]  Jorge Peña,et al.  The Construction of Away Messages: A Speech Act Analysis , 2006, J. Comput. Mediat. Commun..

[5]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[6]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[7]  Gökhan Tür,et al.  Combining active and semi-supervised learning for spoken language understanding , 2005, Speech Commun..

[8]  Csr Young,et al.  How to Do Things With Words , 2009 .

[9]  木村 和夫 Pragmatics , 1997, Language Teaching.

[10]  Jay F. Nunamaker,et al.  Using Speech Act Profiling for Deception Detection , 2004, ISI.

[11]  Owen Rambow,et al.  Using Question-Answer Pairs in Extractive Summarization of Email Conversations , 2007, CICLing.

[12]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[13]  Yuji Matsumoto,et al.  A Boosting Algorithm for Classification of Semi-Structured Text , 2004, EMNLP.

[14]  Ayhan Demiriz,et al.  Exploiting unlabeled data in ensemble methods , 2002, KDD.

[15]  Elizabeth Shriberg,et al.  Automatic dialog act segmentation and classification in multiparty meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[16]  Kaizhong Zhang,et al.  Fast Algorithms for the Unit Cost Editing Distance Between Trees , 1990, J. Algorithms.

[17]  Tom M. Mitchell,et al.  Learning to Classify Email into “Speech Acts” , 2004, EMNLP.

[18]  Jihie Kim,et al.  Profiling Student Interactions in Threaded Discussions with Speech Act Classifiers , 2007, AIED.

[19]  J. Sadock Speech acts , 2007 .

[20]  Julia Hirschberg,et al.  Identifying Agreement and Disagreement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies , 2004, ACL.

[21]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[22]  Carolyn Penstein Rosé,et al.  Analyzing collaborative learning processes automatically: Exploiting the advances of computational linguistics in computer-supported collaborative learning , 2008, Int. J. Comput. Support. Collab. Learn..

[23]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[24]  Elizabeth Shriberg,et al.  Meeting Recorder Project: Dialog Act Labeling Guide , 2004 .