Improving machine-learned detection of miscommunications in human-machine dialogues through informed data splitting

In this paper we study two types of machine learning techniques, rule-induction and memorybased learning, for error detection in spoken dialogue systems. The learners are trained and tested on two tasks: predicting whether the current user utterance will cause problems, and identifying whether the previous user utterance has caused a problem in the ongoing dialogue. We focus on a variety of features readily available in the majority of spoken dialogue systems: dialogue history, recognized words, and prosodic characteristics of the user input. We find that the learners gain relatively little from the inclusion of prosodic features, even though at first sight the general prosodic trends in our corpus are in agreement with earlier observations from the literature. A closer inspection of the data reveals that the prosodic feature values are highly dependent on the problem’s context, represented by the most recently asked system question type. As a consequence, when separate classifiers are trained on subsets of the data that are split by system question type, the learners profit much more from prosodic information. It is shown that such an informed splitting is beneficial for our other feature sets as well. The consequences of this approach for error detection are discussed.

[1]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[2]  D. J. Hermes,et al.  Measurement of pitch by subharmonic summation. , 1988, The Journal of the Acoustical Society of America.

[3]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Walter Daelemans,et al.  Generalization performance of backpropagation learning on a syllabification task , 1992 .

[6]  Elizabeth Shriberg,et al.  Human-Machine Problem Solving Using Spoken Language Systems (SLS): Factors Affecting Performance and User Satisfaction , 1992, HLT.

[7]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[8]  Sharon L. Oviatt,et al.  Predicting hyperarticulate speech during human-computer error resolution , 1998, Speech Commun..

[9]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[10]  Marilyn A. Walker,et al.  Automatic Detection of Poor Speech Recognition at the Dialogue Level , 1999, ACL.

[11]  Emiel Krahmer,et al.  Error spotting in human-machine interaction , 1999 .

[12]  Julia Hirschberg,et al.  Prosodic cues to recognition errors , 1999 .

[13]  Julia Hirschberg,et al.  Predicting Automatic Speech Recognition Performance Using Prosodic Cues , 2000, ANLP.

[14]  Julia Hirschberg,et al.  Corrections in spoken dialogue systems , 2000, INTERSPEECH.

[15]  Marilyn A. Walker,et al.  Learning to Predict Problematic Situations in a Spoken Dialogue System: Experiments with How May I Help You? , 2000, ANLP.

[16]  Julia Hirschberg,et al.  Generalizing prosodic prediction of speech recognition errors , 2000, INTERSPEECH.

[17]  Jeremy H. Wright,et al.  Using Natural Language Processing and Discourse Features to Identify Understanding Errors in a Spoken Dialogue System , 2000 .

[18]  Shimei Pan,et al.  Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System , 2000, AAAI/IAAI.

[19]  D. Litman,et al.  Predicting User Reactions to System Error , 2001, ACL.

[20]  Emiel Krahmer,et al.  Detecting Problematic Turns in Human-Machine Interactions: Rule-induction Versus Memory-based Learning Approaches , 2001, ACL.

[21]  Stephanie Seneff,et al.  Prosodic Scoring of Recognition Outputs in the JUPITER Domain , 2003 .

[22]  Walter Daelemans,et al.  IGTree: Using Trees for Compression and Classification in Lazy Learning Algorithms , 1997, Artificial Intelligence Review.