Comparison and Combination of Lightly Supervised Approaches for Language Portability of a Spoken Language Understanding System

Portability of a spoken dialogue system (SDS) to a new domain or a new language is a hot topic as it may imply gains in time and cost for building new SDSs. In particular in this paper we investigate several fast and efficient approaches for language portability of the spoken language understanding (SLU) module of a dialogue system. We show that the use of statistical machine translation (SMT) can reduce the time and the cost of porting a system from a source to a target language. For conceptual decoding, a state-of-the-art module based on conditional random fields (CRF) is used and a new approach based on phrase-based statistical machine translation (PB-SMT) is also evaluated. The experimental results show the efficiency of the proposed methods for a fast and low cost SLU language portability. In addition, we propose two methods to increase SLU robustness to translation errors. Overall, it is shown that the combination of all these approaches can further reduce the concept error rate. While most of the experiments in this paper deal with portability from French to Italian (given the availability of the Media French corpus and its subset manually translated into Italian), a validation of our methodology is eventually proposed in Arabic.

[1]  Gökhan Tür,et al.  Combining active and semi-supervised learning for spoken language understanding , 2005, Speech Commun..

[2]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[3]  Hermann Ney,et al.  A Comparison of Various Methods for Concept Tagging for Spoken Language Understanding , 2008, LREC.

[4]  Helen M. Meng,et al.  Semi-automatic acquisition of domain-specific semantic structures , 1999, EUROSPEECH.

[5]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[6]  Christine Doran,et al.  Dialogue complexity with portability? Research directions for the Information State approach , 2003, HLT-NAACL 2003.

[7]  Fabrice Lefèvre Dynamic Bayesian Networks and Discriminative Classifiers for Multi-Stage Semantic Interpretation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  Fabrice Lefèvre,et al.  Investigating multiple approaches for SLU portability to a new language , 2010, INTERSPEECH.

[9]  Fabrice Lefèvre,et al.  Cross-lingual spoken language understanding from unaligned data using discriminative classification models and machine translation , 2010, INTERSPEECH.

[10]  Frédéric Béchet,et al.  On the use of machine translation for spoken language understanding portability , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[12]  H. Bonneau-Maynard,et al.  Investigating stochastic speech understanding , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[13]  Joan-Andreu Sánchez,et al.  Part-of-Speech Tagging Based on Machine Translation Techniques , 2007, IbPRIA.

[14]  Chalapathy Neti,et al.  Towards a universal speech recognizer for multiple languages , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[15]  Fabrice Lefèvre,et al.  Combination of stochastic understanding and machine translation systems for language portability of dialogue systems , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Gökhan Tür,et al.  Active labeling for spoken language understanding , 2003, INTERSPEECH.

[17]  Tatsuya Kawahara,et al.  Domain-independent spoken dialogue platform using key-phrase spotting based on combined language model , 2001, INTERSPEECH.

[18]  David Suendermann-Oeft,et al.  From rule-based to statistical grammars: Continuous improvement of large-scale spoken dialog systems , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Fabrice Lefèvre,et al.  Unsupervised Alignment for Segmental-based Language Understanding , 2011, ULNLP@EMNLP.

[20]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[21]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[22]  Sophie Rosset,et al.  Semantic annotation of the French media dialog corpus , 2005, INTERSPEECH.

[23]  Alex Acero,et al.  Discriminative models for spoken language understanding , 2006, INTERSPEECH.

[24]  F. Lefvre Dynamic Bayesian Networks and Discriminative Classifiers for Multi-Stage Semantic Interpretation , 2007 .

[25]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[26]  Jean-Luc Gauvain,et al.  Genericity and portability for task-independent speech recognition , 2005, Comput. Speech Lang..

[27]  Richard M. Schwartz,et al.  Language understanding using hidden understanding models , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[28]  Roland Kuhn,et al.  Rule-Based Translation with Statistical Phrase-Based Post-Editing , 2007, WMT@ACL.

[29]  Hermann Ney,et al.  Applications of Statistical Machine Translation Approaches to Spoken Language Understanding , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Gorka Labaka,et al.  Statistical Post-Editing : A Valuable Method in Domain Adaptation of RBMT Systems for Less-Resourced Languages , 2008 .

[31]  David Suendermann-Oeft,et al.  Localization of speech recognition in spoken dialog systems: how machine translation can make our lives easier , 2009, INTERSPEECH.

[32]  Sophie Rosset,et al.  A semantic representation for spoken dialogs , 2003, INTERSPEECH.

[33]  Tanja Schultz,et al.  Challenges with Rapid Adaptation of Speech Translation Systems to New Language Pairs , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[34]  Ruhi Sarikaya,et al.  Rapid bootstrapping of statistical spoken dialogue systems , 2008, Speech Commun..

[35]  P. Fung,et al.  Multilingual spoken language processing , 2008, IEEE Signal Processing Magazine.

[36]  Liang Gu,et al.  Portability challenges in developing interactive dialogue systems , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[37]  Tetsuya Ogata,et al.  Automatic Allocation of Training Data for Rapid Prototyping of Speech Understanding based on Multiple Model Combination , 2010, COLING.

[38]  Michel Simard,et al.  Statistical Phrase-Based Post-Editing , 2007, NAACL.

[39]  Liang Gu,et al.  Rapid transition to new spoken dialogue domains: language model training using knowledge from previous domain applications and web text resources , 2005, INTERSPEECH.

[40]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[41]  Hermann Ney,et al.  Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  Anil Kumar Singh,et al.  Modeling Letter-to-Phoneme Conversion as a Phrase Based Statistical Machine Translation Problem with Minimum Error Rate Training , 2009, HLT-NAACL.