A Bayesian approach to semantic composition for spoken language interpretation

This paper introduces a stochastic interpretation process for composing semantic structures. This process, dedicated to spoken language interpretation, allows to derive semantic frame structures directly from word and basic concept sequences representing the users’ utterances. First a two-step rule-based process has been used to provide a reference semantic frame annotation of the speech training data. Then, through a decoding stage, dynamic Bayesian networks are used to hypothesize frames with confidence scores from test data. The semantic frames used in this work have been derived from the Berkeley FrameNet paradigm. Experiments are reported on the MEDIA corpus. MEDIA is a French dialog corpus recorded using a Wizard of Oz system simulating a telephone server for tourist information and hotel booking. For all the data the manual transcriptions and annotations at the word and concept levels are available. In order to evaluate the robustness of the proposed approach tests are performed under 3 different conditions raising in difficulty wrt the errors in the word and concept sequence inputs: (i) according to whether they are manually transcribed and annotated, (ii) manually transcribed and enriched with concepts provided by an automatic annotation, (iii) fully automatically transcribed and annotated. From the experiment results it appears that the proposed probabilistic framework is able to carry out semantic frame annotation with a good reliability, comparable to a semimanual rule-based approach.

[1]  Roberto Pieraccini,et al.  Concept-based spontaneous speech understanding system , 1995, EUROSPEECH.

[2]  Fabrice Lef A DBN-BASED MULTI-LEVEL STOCHASTIC SPOKEN LANGUAGE UNDERSTANDING SYSTEM , 2006 .

[3]  F. Lefvre Dynamic Bayesian Networks and Discriminative Classifiers for Multi-Stage Semantic Interpretation , 2007 .

[4]  H. Bonneau-Maynard,et al.  A 2+1-level stochastic understanding model , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[5]  Philippe Roussel,et al.  The birth of Prolog , 1993, HOPL-II.

[6]  John A. N. Lee,et al.  The second ACM SIGPLAN conference on History of programming languages , 1993 .

[7]  Fabrice Lef DYNAMIC BAYESIAN NETWORKS AND DISCRIMINATIVE CLASSIFIERS FOR MULTI-STAGE SEMANTIC INTERPRETATION , 2007 .

[8]  Geoffrey Zweig,et al.  The graphical models toolkit: An open source software system for speech and time-series processing , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Frédéric Béchet,et al.  Semantic Frame Annotation on the French MEDIA corpus , 2008, LREC.

[10]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[11]  William A. Woods,et al.  What's in a Link: Foundations for Semantic Networks , 1975 .

[12]  Georges Linarès,et al.  Frame-based acoustic feature integration for speech understanding , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Frédéric Béchet,et al.  On the use of finite state transducers for semantic interpretation , 2006, Speech Commun..

[14]  Christopher R. Johnson,et al.  Background to Framenet , 2003 .

[15]  Encarna Segarra,et al.  Language Understanding Using Two-Level Stochastic Models with POS and Semantic Units , 2001, TSD.

[16]  Fabrice Lefèvre A DBN-BASED MULTI-LEVEL STOCHASTIC SPOKEN LANGUAGE UNDERSTANDING SYSTEM , 2006 .

[17]  Jeff A. Bilmes,et al.  Factored Language Models and Generalized Parallel Backoff , 2003, NAACL.

[18]  Steve J. Young,et al.  Spoken language understanding using the Hidden Vector State Model , 2006, Speech Commun..

[19]  J. Lowe,et al.  A Frame-Semantic Approach to Semantic Annotation , 1997 .