Flexible Speech Translation Systems

Speech translation research has made significant progress over the years with many high-visibility efforts showing that translation of spontaneously spoken speech from and to diverse languages is possible and applicable in a variety of domains. As language and domains continue to expand, practical concerns such as portability and reconfigurability of speech come into play: system maintenance becomes a key issue and data is never sufficient to cover the changing domains over varying languages. In this paper, we discuss strategies to overcome the limits of today's speech translation systems. In the first part, we describe our layered system architecture that allows for easy component integration, resource sharing across components, comparison of alternative approaches, and the migration toward hybrid desktop/PDA or stand-alone PDA systems. In the second part, we show how flexibility and reconfigurability is implemented by more radically relying on learning approaches and use our English–Thai two-way speech translation system as a concrete example.

[1]  Paul Taylor,et al.  Festival Speech Synthesis System , 1998 .

[2]  H. Soltau,et al.  Efficient handling of multilingual language models , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[3]  Tanja Schultz,et al.  Speechalator: two-way speech-to-speech translation on a consumer PDA , 2003, INTERSPEECH.

[4]  Tanja Schultz,et al.  Thai automatic speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[5]  Tanja Schultz,et al.  Domain Portability in Speech-to-Speech Translation , 2001, HLT.

[6]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars, with Application to Segmentation, Bracketing, and Alignment of Parallel Corpora , 1995, IJCAI.

[7]  Klaus Ries,et al.  The Karlsruhe-Verbmobil speech recognition engine , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Marsal Gavaldà SOUP: A Parser for Real-world Spontaneous Speech , 2000, IWPT.

[9]  M. Ostendorf,et al.  A bootstrapping approach to automating prosodic annotation for limited-domain synthesis , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[10]  Alon Lavie,et al.  An interlingua based on domain actions for machine translation of task-oriented dialogues , 1998, ICSLP.

[11]  Boonserm Kijsirikul,et al.  Feature-based Thai unknown word boundary identification using Winnow , 1998, IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No.98EX242).

[12]  Tanja Schultz,et al.  Grapheme based speech recognition , 2003, INTERSPEECH.

[13]  Fabio Pianesi,et al.  Architecture and Design Considerations in NESPOLE!: a Speech Translation System for E-commerce Applications , 2001, HLT.

[14]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[15]  Alan W. Black,et al.  Arabic in my hand: small-footprint synthesis of egyptian arabic , 2003, INTERSPEECH.

[16]  Paul Taylor,et al.  Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.

[17]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[18]  Tanja Schultz,et al.  A Thai Speech Translation System for Medical Dialogs , 2004, HLT-NAACL.

[19]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[20]  A. Waibel,et al.  A one-pass decoder based on polymorphic linguistic context assignment , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[21]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[22]  Tanja Schultz,et al.  Globalphone: a multilingual speech and text database developed at karlsruhe university , 2002, INTERSPEECH.

[23]  Alan W. Black,et al.  Limited domain synthesis , 2000, INTERSPEECH.

[24]  Tanja Schultz,et al.  Enhanced tree clustering with single pronunciation dictionary for conversational speech recognition , 2003, INTERSPEECH.

[25]  Harriet J. Nock,et al.  Pronunciation modeling by sharing gaussian densities across phonetic models , 1999, EUROSPEECH.

[26]  Detlef Koll,et al.  Modeling and efficient decoding of large vocabulary conversational speech , 1999, EUROSPEECH.

[27]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[28]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[29]  Hermann Ney,et al.  Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  Martine Grice,et al.  The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using Semantically Unpredictable Sentences , 1996, Speech Commun..

[31]  Alan W. Black,et al.  Issues in building general letter to sound rules , 1998, SSW.

[32]  Tanja Schultz,et al.  Speechalator: Two-Way Speech-to-Speech Translation in Your Hand , 2003, HLT-NAACL.

[33]  S. Vogel,et al.  SMT decoder dissected: word reordering , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[34]  D B Pisoni,et al.  Segmental intelligibility of synthetic speech produced by rule. , 1989, The Journal of the Acoustical Society of America.