A reranking approach for recognition and classification of speech input in conversational dialogue systems

We address the challenge of interpreting spoken input in a conversational dialogue system with an approach that aims to exploit the close relationship between the tasks of speech recognition and language understanding through joint modeling of these two tasks. Instead of using a standard pipeline approach where the output of a speech recognizer is the input of a language understanding module, we merge multiple speech recognition and utterance classification hypotheses into one list to be processed by a joint reranking model. We obtain substantially improved performance in language understanding in experiments with thousands of user utterances collected from a deployed spoken dialogue system.

[1]  Bhuvana Ramabhadran,et al.  Hill climbing on speech lattices: A new rescoring framework , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Stacy Marsella,et al.  A Virtual Human Dialogue Model for Non-Team Interaction , 2008 .

[3]  Brian Roark,et al.  Discriminative n-gram language modeling , 2007, Comput. Speech Lang..

[4]  Anton Leuski,et al.  Practical Language Processing for Virtual Humans , 2010, IAAI.

[5]  Gabriel Skantze,et al.  A General, Abstract Model of Incremental Dialogue Processing , 2011 .

[6]  Yonghong Yan,et al.  Universal speech tools: the CSLU toolkit , 1998, ICSLP.

[7]  Ron Artstein,et al.  Viability of a Simple Dialogue Act Scheme for a Tactical Questioning Dialogue System , 2009 .

[8]  Anton Leuski,et al.  Ada and Grace: Toward Realistic and Engaging Virtual Museum Guides , 2010, IVA.

[9]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[10]  David DeVault,et al.  Towards Natural Language Understanding of Partial Speech Recognition Results in Dialogue Systems , 2009, HLT-NAACL.

[11]  Anton Leuski,et al.  Improving Spoken Dialogue Understanding Using Phonetic Mixture Models , 2011, FLAIRS.

[12]  Brian Roark,et al.  Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm , 2004, ACL.

[13]  Eric K. Ringger,et al.  ROBUST ERROR CORRECTION OF CONTINUOUS SPEECH RECOGNITION , 2007 .

[14]  Alexander I. Rudnicky,et al.  Olympus: an open-source framework for conversational spoken language interface research , 2007, HLT-NAACL 2007.

[15]  Athanasios Katsamanis,et al.  The Twins Corpus of Museum Visitor Questions , 2012, LREC.