Language model adaptation for a speech to sign language translation system using web frequencies and a MAP framework

This paper presents a successful technique for creating a new language model (LM) that adapts the original target LM used by a machine translation (MT) system. This technique is especially useful for situations where there are very scarce resources for training the target side (Spanish Sign Language (LSE) in our case) in order to properly estimate the target LM, the Sign Language Model (SLM), used by the MT system. The technique uses information from the source language, Spanish in our task, and from the phrase-based translation matrix in order to create a new LM, estimated using web frequencies, which adapts the counts of the SLM through the Maximum A Posteriori method (MAP). The corpus consists of common used sentences spoken by an officer when assisting people in applying for, or renewing, the National Identification Document. The proposed technique allows relative reductions of 15.5% on perplexity and 2.7% on WER for translation, which are close to half the maximum performance obtainable when only the LM is optimized. Index Terms: language model adaptation, machine translation, sign language, web counts.

[1]  M. Inés Torres,et al.  Evaluation of alternatives on speech to sign language translation , 2007, INTERSPEECH.

[2]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[3]  Hermann Ney,et al.  Morpho-Syntax Based Statistical Methods for Automatic Sign Language Translation , 2006, EAMT.

[4]  Norman I. Badler,et al.  A machine translation system from English to American Sign Language , 2000, AMTA.

[5]  Andy Way,et al.  An Example-Based Approach to Translating Sign Language , 2005, MTSUMMIT.

[6]  Brian Roark,et al.  MAP adaptation of stochastic grammars , 2006, Comput. Speech Lang..

[7]  Javier Macías Guarasa,et al.  New Advances in Cross-Task and Speaker Adaptation for Air Traffic Control Tasks , 2005, Proces. del Leng. Natural.

[8]  Andy Way,et al.  Hand in hand: automatic sign language to English translation , 2007, TMI.

[9]  R. San-Segundo,et al.  A SPANISH SPEECH TO SIGN LANG FOR ASSISTING , 2006 .

[10]  Jerome R. Bellegarda,et al.  Statistical language model adaptation: review and perspectives , 2004, Speech Commun..

[11]  Rubén San-Segundo-Hernández,et al.  A Spanish speech to sign language translation system for assisting deaf-mute people , 2006, INTERSPEECH.

[12]  Chung-Hsien Wu,et al.  Joint Optimization of Word Alignment and Epenthesis Generation for Chinese to Taiwanese Sign Synthesis , 2007 .

[13]  Frank Keller,et al.  Using the Web to Obtain Frequencies for Unseen Bigrams , 2003, CL.

[14]  Hermann Ney,et al.  Morpho-syntax Based Statistical Methods for Sign Language Translation vorgelegt von : Cand , 2006 .

[15]  Ronald Rosenfeld,et al.  Improving trigram language modeling with the World Wide Web , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[16]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.