Machine Translation in Pronunciation Space

The research in machine translation community focus on translation in text space. However, humans are in fact also good at direct translation in pronunciation space. Some existing translation systems, such as simultaneous machine translation, are inherently more natural and thus potentially more robust by directly translating in pronunciation space. In this paper, we conduct large scale experiments on a self-built dataset with about $20$M En-Zh pairs of text sentences and corresponding pronunciation sentences. We proposed three new categories of translations: $1)$ translating a pronunciation sentence in source language into a pronunciation sentence in target language (P2P-Tran), $2)$ translating a text sentence in source language into a pronunciation sentence in target language (T2P-Tran), and $3)$ translating a pronunciation sentence in source language into a text sentence in target language (P2T-Tran), and compare them with traditional text translation (T2T-Tran). Our experiments clearly show that all $4$ categories of translations have comparable performances, with small and sometimes ignorable differences.

[1]  Kevin Knight,et al.  Grapheme-to-Phoneme Models for (Almost) Any Language , 2016, ACL.

[2]  José A. R. Fonollosa,et al.  Dealing with Input Noise in Statistical Machine Translation , 2012, COLING.

[3]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[4]  Paul Taylor,et al.  Hidden Markov models for grapheme to phoneme conversion , 2005, INTERSPEECH.

[5]  Haifeng Wang,et al.  STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework , 2018, ACL.

[6]  Zhongjun He,et al.  Robust Neural Machine Translation with Joint Textual and Phonetic Embedding , 2018, ACL.

[7]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[8]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[9]  Marcello Federico,et al.  Phonetically-oriented word error alignment for speech recognition error analysis in speech translation , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[10]  Fuchun Peng,et al.  Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[12]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[13]  Pushpak Bhattacharyya,et al.  Learning variable length units for SMT between related languages via Byte Pair Encoding , 2016, SWCN@EMNLP.

[14]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[15]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[16]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[17]  Tomoki Toda,et al.  Optimizing Segmentation Strategies for Simultaneous Speech Translation , 2014, ACL.

[18]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[19]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[20]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[21]  Roberta Panini,et al.  [Oral communication: short history and some rules]. , 2015, Giornale italiano di nefrologia : organo ufficiale della Societa italiana di nefrologia.

[22]  Maosong Sun,et al.  CHIME: An Efficient Error-Tolerant Chinese Pinyin Input Method , 2011, IJCAI.

[23]  Andy Way,et al.  Pinyin as Subword Unit for Chinese-Sourced Neural Machine Translation , 2017, AICS.

[24]  Adam Coates,et al.  Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.

[25]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[26]  Jordan L. Boyd-Graber,et al.  Don't Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation , 2014, EMNLP.

[27]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.