High-quality speech-to-speech translation for computer-aided language learning

This article describes our research on spoken language translation aimed toward the application of computer aids for second language acquisition. The translation framework is incorporated into a multilingual dialogue system in which a student is able to engage in natural spoken interaction with the system in the foreign language, while speaking a query in their native tongue at any time to obtain a spoken translation for language assistance. Thus the quality of the translation must be extremely high, but the domain is restricted. Experiments were conducted in the weather information domain with the scenario of a native English speaker learning Mandarin Chinese. We were able to utilize a large corpus of English weather-domain queries to explore and compare a variety of translation strategies: formal, example-based, and statistical. Translation quality was manually evaluated on a test set of 695 spontaneous utterances. The best speech translation performance (89.9% correct, 6.1% incorrect, and 4.0% rejected), is achieved by a system which combines the formal and example-based methods, using parsability by a domain-specific Chinese grammar as a rejection criterion.

[1]  Alexander I. Rudnicky,et al.  Speech Translation on a Tight Budget without Enough Data , 2002, Speech-to-Speech Translation@ACL.

[2]  Michael Picheny,et al.  A trainable approach for multi-lingual speech-to-speech translation system , 2002 .

[3]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[5]  Stephanie Seneff,et al.  High-quality Speech Translation for Language Learning , 2004 .

[6]  P. J. Price,et al.  Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[7]  Mark Seligman,et al.  Nine Issues in Speech Translation , 2000, Machine Translation.

[8]  Steve Young,et al.  A data-driven spoken language understanding system , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[9]  Satoshi Sato,et al.  CTM: An Example-Based Translation Aid System , 1992, COLING.

[10]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[11]  Michael Picheny,et al.  MARS: A Statistical Semantic Parsing and Generation-Based Multilingual Automatic tRanslation System , 2002, Machine Translation.

[12]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[13]  Hermann Ney,et al.  Some approaches to statistical and finite-state speech-to-speech translation , 2004, Comput. Speech Lang..

[14]  Stephanie Seneff,et al.  TINA: A Natural Language System for Spoken Language Applications , 1992, Comput. Linguistics.

[15]  Daniel Marcu,et al.  Towards a Unified Approach to Memory- and Statistical-Based Machine Translation , 2001, ACL.

[16]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[17]  Stephanie Seneff,et al.  Dialogue Management in the Mercury Flight Reservation System , 2000 .

[18]  Stephanie Seneff,et al.  GENESIS-II: a versatile system for language generation in conversational system applications , 2000, INTERSPEECH.

[19]  Alon Lavie,et al.  The Janus-III Translation System: Speech-to-Speech Translation in Multiple Domains , 2004, Machine Translation.

[20]  Hitoshi Iida,et al.  Experiments and Prospects of Example-Based Machine Translation , 1991, ACL.

[21]  Alexander H. Waibel,et al.  Modeling with Structures in Statistical Machine translation , 1998, ACL.

[22]  Brooke A. Cowan PLUTO : a preprocessor for multilingual spoken language generation , 2004 .

[23]  Stephanie Seneff,et al.  Automatic induction of n-gram language models from a natural language grammar , 2003, INTERSPEECH.

[24]  Stephanie Seneff,et al.  Translingual grammar induction , 2004, INTERSPEECH.

[25]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[26]  Shankar Kumar,et al.  A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation , 2003, NAACL.

[27]  Satoshi Sato An Example-based Translation Aid System , 1992, COLING 1992.

[28]  Min Tang,et al.  Active Learning for Statistical Natural Language Parsing , 2002, ACL.

[29]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[30]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[31]  Hermann Ney,et al.  Algorithms for statistical translation of spoken language , 2000, IEEE Trans. Speech Audio Process..

[32]  Hermann Ney,et al.  Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[33]  Hitoshi Iida,et al.  Spoken-Language Translation Method Using Examples , 1996, COLING.

[34]  Hermann Ney,et al.  Phrase-Based Statistical Machine Translation , 2002, KI.

[35]  Yuji Matsumoto,et al.  Retrieving Meaning-equivalent Sentences for Example-based Rough Translation , 2003, ParallelTexts@NAACL-HLT.

[36]  S. Seneff,et al.  Spoken Conversational Interaction for Language Learning , 2004 .

[37]  Eiichiro Sumita Example-based machine translation using DP-matching between work sequences , 2001, DDMMT@ACL.

[38]  Stephanie Sene Robust Parsing for Spoken Language Systems , 1992 .

[39]  Manny Rayner,et al.  Hybrid language processing in the Spoken Language Translator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[40]  James R. Glass A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..

[41]  Srinivas Bangalore,et al.  Head-Transducer Models for Speech Translation and Their Automatic Acquisition from Bilingual Data , 2004, Machine Translation.

[42]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[43]  Stephanie Seneff Robust parsing for spoken language systems , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[44]  WangChao,et al.  High-quality speech-to-speech translation for computer-aided language learning , 2006 .

[45]  Ralf D. Brown,et al.  Adding linguistic knowledge to a lexical example-based translation system , 1999, TMI.

[46]  Srinivas Bangalore,et al.  Stochastic finite-state models for spoken language machine translation , 2000 .