论文信息 - On feature selection in maximum entropy approach to statistical concept-based speech-to-speech translation

On feature selection in maximum entropy approach to statistical concept-based speech-to-speech translation

Feature selection is critical to the performance of maximumentropy-based statistical concept-based spoken language translation. The source language spoken message is first parsed into a structured conceptual tree, and then generated into the target language based on maximum entropy modeling. To improve feature selection in this maximum entropy approach, a new concept-word feature is proposed, which exploits both concept-level and word-level information. It thus enables the design of concise yet informative concept sets and easies both annotation and parsing efforts. The concept generation error rate is reduced by over 90% on training set and 7% on test set in our speech translation corpus within limited domains. To alleviate data sparseness problem, multiple feature sets are proposed and employed, which achieves 10%-14% further error rate reduction. Improvements are also achieved in our experiments on speech-to-speech translation.

Liang Gu | Yuqing Gao

[1] David M. Magerman. Natural Language Parsing as Statistical Pattern Recognition , 1994, ArXiv.

[2] Wolfgang Wahlster,et al. Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[3] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[4] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[5] Hermann Ney,et al. Algorithms for statistical translation of spoken language , 2000, IEEE Trans. Speech Audio Process..

[6] Michael Picheny,et al. Forward-backward modeling in statistical natural concept generation for interlingua-based speech-to-speech translation , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[7] Hitoshi Iida,et al. A Japanese-to-English speech translation system: ATR-MATRIX , 1998, ICSLP.

[8] Michael Picheny,et al. Improving statistical natural concept generation in interlingua-based speech-to-speech translation , 2003, INTERSPEECH.

[9] Michael Picheny,et al. MARS: A Statistical Semantic Parsing and Generation-Based Multilingual Automatic tRanslation System , 2002, Machine Translation.

[10] Adwait Ratnaparkhi,et al. Trainable Methods for Surface Natural Language Generation , 2000, ANLP.

[11] Michael Picheny,et al. Use of statistical N-gram models in natural language generation for machine translation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12] Alon Lavie,et al. Janus-III: speech-to-speech translation in multiple languages , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.