Generalization of Discriminative Approaches for Speech Language Understanding in a Multilingual Context

Probabilistic approaches are now widespread in the various applications of natural language processing and elicitation of a particular approach usually depends on the task at hand. Targeting multilingual interpretation of speech, this paper presents a comparison between the state-of-the-art methods used for machine translation and speech understanding. This comparison justifies our proposition of a unified framework to perform a joint decoding which translates a sentence and assigns semantic tags to this translation in the same process. The decoding is achieved using a cascade of finite-state transducers allowing to compose translation and understanding hypothesis graphs. This representation is favorable as it can be generalized to allow rich transmission of information between the components of a human-machine vocal interface.

[1]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[2]  Sophie Rosset,et al.  Semantic annotation of the French media dialog corpus , 2005, INTERSPEECH.

[3]  Hermann Ney,et al.  Applications of Statistical Machine Translation Approaches to Spoken Language Understanding , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Hermann Ney,et al.  Natural language understanding using statistical machine translation , 2001, INTERSPEECH.

[5]  Fabrice Lefèvre,et al.  Investigating multiple approaches for SLU portability to a new language , 2010, INTERSPEECH.

[6]  José B. Mariño,et al.  Ncode: an Open Source Bilingual N-gram SMT Toolkit , 2011, Prague Bull. Math. Linguistics.

[7]  Gökhan Tür,et al.  Improving spoken language understanding using word confusion networks , 2002, INTERSPEECH.

[8]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[9]  Alexandre Allauzen,et al.  From n-gram-based to CRF-based Translation Models , 2011, WMT@EMNLP.

[10]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[11]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[12]  Borivoj Melichar,et al.  Finding Common Motifs with Gaps Using Finite Automata , 2006, CIAA.

[13]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[14]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[15]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[16]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[17]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[18]  José B. Mariño,et al.  Improving statistical MT by coupling reordering and decoding , 2006, Machine Translation.

[19]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[20]  Hermann Ney,et al.  Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Johan Schalkwyk,et al.  OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[22]  Anil Kumar Singh,et al.  Modeling Letter-to-Phoneme Conversion as a Phrase Based Statistical Machine Translation Problem with Minimum Error Rate Training , 2009, HLT-NAACL.

[23]  Frédéric Béchet,et al.  Conceptual decoding from word lattices: application to the spoken dialogue corpus MEDIA , 2006, INTERSPEECH.

[24]  I. Dan Melamed,et al.  Scalable Discriminative Learning for Natural Language Parsing and Translation , 2006, NIPS.

[25]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[26]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[27]  Gökhan Tür,et al.  Joint Decoding for Speech Recognition and Semantic Tagging , 2012, INTERSPEECH.

[28]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[29]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[30]  Joan-Andreu Sánchez,et al.  Part-of-Speech Tagging Based on Machine Translation Techniques , 2007, IbPRIA.

[31]  Gökhan Tür,et al.  Beyond ASR 1-best: Using word confusion networks in spoken language understanding , 2006, Comput. Speech Lang..

[32]  Fabrice Lefèvre,et al.  Combination of stochastic understanding and machine translation systems for language portability of dialogue systems , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).