Dynamic Bayesian Networks and Discriminative Classifiers for Multi-Stage Semantic Interpretation

In this paper, a multi-stage spoken language understanding system is presented. This stochastic module is for the first time based on a combination of dynamic Bayesian networks and conditional random field classifiers. The former generative models allow to derive basic concept sequences from the word sequences which are in turn augmented with modalities and hierarchical information by the latter discriminative models. To provide efficiently smoothed conditional probability estimates, factored language models with a generalized parallel backoff procedure are used as the network edge implementation. This framework allows a great flexibility in terms of probability representation facilitating the development of the stochastic levels (semantic and lexical) of the system. Experiments are carried out on the French MEDIA task (tourist information and hotel booking). The MEDIA 10k-utterance training corpus is conceptually rich (more than 80 basic concepts) and is provided with a manually segmented annotation. On this complex task, the proposed multi-stage system is shown to offer better performance than the MEDIA'05 evaluation campaign best system (H. Bonneau-Maynard et al., 2006).

[1]  Geoffrey Zweig,et al.  The graphical models toolkit: An open source software system for speech and time-series processing , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Jeff A. Bilmes,et al.  Factored Language Models and Generalized Parallel Backoff , 2003, NAACL.

[3]  Encarna Segarra,et al.  Language Understanding Using Two-Level Stochastic Models with POS and Semantic Units , 2001, TSD.

[4]  Alex Acero,et al.  Discriminative models for spoken language understanding , 2006, INTERSPEECH.

[5]  Jeff A. Bilmes,et al.  Backoff Model Training using Partially Observed Data: Application to Dialog Act Tagging , 2006, NAACL.

[6]  Fabrice Lefèvre A DBN-BASED MULTI-LEVEL STOCHASTIC SPOKEN LANGUAGE UNDERSTANDING SYSTEM , 2006, 2006 IEEE Spoken Language Technology Workshop.

[7]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[8]  Frédéric Béchet,et al.  On the use of finite state transducers for semantic interpretation , 2006, Speech Commun..

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Steve J. Young,et al.  Spoken language understanding using the Hidden Vector State Model , 2006, Speech Commun..

[11]  Roberto Pieraccini,et al.  Concept-based spontaneous speech understanding system , 1995, EUROSPEECH.

[12]  H. Bonneau-Maynard,et al.  A 2+1-level stochastic understanding model , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..