A unified framework for translation and understanding allowing discriminative joint decoding for multilingual speech semantic interpretation

HighlightsA comparison between the methods used for speech translation and understanding.A unified framework for translation and understanding.A discriminative joint decoding for multilingual speech semantic interpretation.The proposition is competitive with state-of-the-art techniques.The framework can be generalized to other components of a dialogue system. Probabilistic approaches are now widespread in most natural language processing applications and selection of a particular approach usually depends on the task at hand. Targeting speech semantic interpretation in a multilingual context, this paper presents a comparison between the state-of-the-art methods used for machine translation and speech understanding. This comparison justifies our proposition of a unified framework for both tasks based on a discriminative approach. We demonstrate that this framework can be used to perform a joint translation-understanding decoding which allows to combine, in the same process, translation and semantic tagging scores of a sentence. A cascade of finite-state transducers is used to compose the translation and understanding hypothesis graphs (1-bests, word graphs or confusion networks). Not only this proposition is competitive with the state-of-the-art techniques but also its framework is even more attractive as it can be generalized to other components of human-machine vocal interfaces (e.g. speech recognizer) so as to allow a richer transmission of information between them.

[1]  Giuseppe Riccardi,et al.  Generative and discriminative algorithms for spoken language understanding , 2007, INTERSPEECH.

[2]  Marc Dymetman,et al.  Intersecting Hierarchical and Phrase-Based Models of Translation: Formal Aspects and Algorithms , 2010, SSST@COLING.

[3]  Hermann Ney,et al.  Applications of Statistical Machine Translation Approaches to Spoken Language Understanding , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[5]  Wolfgang Minker Speech understanding for spoken language systems: portability across domains and languages , 1998 .

[6]  Frédéric Béchet,et al.  Conceptual decoding from word lattices: application to the spoken dialogue corpus MEDIA , 2006, INTERSPEECH.

[7]  Steve J. Young,et al.  Spoken language understanding using the Hidden Vector State Model , 2006, Speech Commun..

[8]  Gökhan Tür,et al.  Semantic parsing using word confusion networks with conditional random fields , 2013, INTERSPEECH.

[9]  Renato De Mori,et al.  Spoken language interpretation: On the use of dynamic Bayesian networks for semantic composition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Alexandre Allauzen,et al.  From n-gram-based to CRF-based Translation Models , 2011, WMT@EMNLP.

[11]  Fabrice Lefèvre,et al.  Investigating multiple approaches for SLU portability to a new language , 2010, INTERSPEECH.

[12]  Fabrice Lefèvre,et al.  Cross-lingual spoken language understanding from unaligned data using discriminative classification models and machine translation , 2010, INTERSPEECH.

[13]  Gökhan Tür,et al.  Improving spoken language understanding using word confusion networks , 2002, INTERSPEECH.

[14]  Hermann Ney,et al.  Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Tanja Schultz,et al.  TOWARDS RAPID LANGUAGE PORTABILITY OF SPEECH PROCESSING SYSTEMS , 2004 .

[16]  Fabrice Lefèvre A DBN-BASED MULTI-LEVEL STOCHASTIC SPOKEN LANGUAGE UNDERSTANDING SYSTEM , 2006, 2006 IEEE Spoken Language Technology Workshop.

[17]  Fabrice Lefèvre,et al.  Comparison and Combination of Lightly Supervised Approaches for Language Portability of a Spoken Language Understanding System , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Fabrice Lefèvre,et al.  Generalization of Discriminative Approaches for Speech Language Understanding in a Multilingual Context , 2013, SLSP.

[19]  Xiao Li,et al.  Semi-supervised learning of semantic classes for query understanding: from the web and for the web , 2009, CIKM.

[20]  Dilek Z. Hakkani-Tür,et al.  Cross-lingual sentence extraction for information distillation , 2008, INTERSPEECH.

[21]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[22]  Fabrice Lefèvre A DBN-BASED MULTI-LEVEL STOCHASTIC SPOKEN LANGUAGE UNDERSTANDING SYSTEM , 2006 .

[23]  I. Dan Melamed,et al.  Scalable Discriminative Learning for Natural Language Parsing and Translation , 2006, NIPS.

[24]  A. Kumaran,et al.  Cross-Lingual Information Retrieval System for Indian Languages , 2008, IJCNLP.

[25]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[26]  Hans Uszkoreit,et al.  A system for supporting cross-lingual information retrieval , 2000, Inf. Process. Manag..

[27]  Joan-Andreu Sánchez,et al.  Part-of-Speech Tagging Based on Machine Translation Techniques , 2007, IbPRIA.

[28]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[29]  Gökhan Tür,et al.  Beyond ASR 1-best: Using word confusion networks in spoken language understanding , 2006, Comput. Speech Lang..

[30]  Fabrice Lefèvre,et al.  Combination of stochastic understanding and machine translation systems for language portability of dialogue systems , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[32]  Johan Schalkwyk,et al.  OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[33]  Anil Kumar Singh,et al.  Modeling Letter-to-Phoneme Conversion as a Phrase Based Statistical Machine Translation Problem with Minimum Error Rate Training , 2009, HLT-NAACL.

[34]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[35]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[36]  José B. Mariño,et al.  Ncode: an Open Source Bilingual N-gram SMT Toolkit , 2011, Prague Bull. Math. Linguistics.

[37]  Frédéric Béchet,et al.  Results of the French Evalda-Media evaluation campaign for literal understanding , 2006, LREC.

[38]  Fabrice Lefèvre,et al.  Error-corrective discriminative joint decoding of automatic spoken language transcription and understanding , 2013, INTERSPEECH.

[39]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[40]  Frédéric Béchet,et al.  On the use of finite state transducers for semantic interpretation , 2006, Speech Commun..

[41]  Shankar Kumar,et al.  A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation , 2003, NAACL.

[42]  Hermann Ney,et al.  Natural language understanding using statistical machine translation , 2001, INTERSPEECH.

[43]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[44]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[45]  José B. Mariño,et al.  Improving statistical MT by coupling reordering and decoding , 2006, Machine Translation.

[46]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[47]  Gökhan Tür,et al.  Joint Decoding for Speech Recognition and Semantic Tagging , 2012, INTERSPEECH.

[48]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[49]  Chin-Hui Lee,et al.  Stochastic Representation of Conceptual Structure in the ATIS Task , 1991, HLT.

[50]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[51]  Nelma Moreira,et al.  Proceedings of the 17th international conference on Implementation and Application of Automata , 2012 .

[52]  Jianfeng Gao,et al.  Training MRF-Based Phrase Translation Models using Gradient Ascent , 2013, NAACL.

[53]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[54]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.