论文信息 - Leveraging translations for speech transcription in low-resource settings

Leveraging translations for speech transcription in low-resource settings

Recently proposed data collection frameworks for endangered language documentation aim not only to collect speech in the language of interest, but also to collect translations into a high-resource language that will render the collected resource interpretable. We focus on this scenario and explore whether we can improve transcription quality under these extremely low-resource settings with the assistance of text translations. We present a neural multi-source model and evaluate several variations of it on three low-resource datasets. We find that our multi-source model with shared attention outperforms the baselines, reducing transcription character error rate by up to 12.3%.

David Chiang | Antonios Anastasopoulos | David Chiang | Antonios Anastasopoulos

[1] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[2] Sebastian Stüker,et al. A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments , 2017, LREC.

[3] Kevin Knight,et al. Multi-Source Neural Translation , 2016, NAACL.

[4] Navdeep Jaitly,et al. Sequence-to-Sequence Models Can Directly Transcribe Foreign Speech , 2017, ArXiv.

[5] Hermann Ney,et al. On the integration of speech recognition and statistical machine translation , 2005, INTERSPEECH.

[6] Navdeep Jaitly,et al. Sequence-to-Sequence Models Can Directly Translate Foreign Speech , 2017, INTERSPEECH.

[7] David Chiang,et al. Transfer Learning across Low-Resource, Related Languages for Neural Machine Translation , 2017, IJCNLP.

[8] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9] P. Lewis. Ethnologue : languages of the world , 2009 .

[10] Steven Bird,et al. Aikuma: A Mobile App for Collaborative Language Documentation , 2014 .

[11] David Chiang,et al. An Attentional Model for Speech Translation Without Transcription , 2016, NAACL.

[12] Christof Monz,et al. Ensemble Learning for Multi-Source Neural Machine Translation , 2016, COLING.

[13] Steven Bird,et al. Collecting Bilingual Audio in Remote Indigenous Communities , 2014, COLING.

[14] Hermann Ney,et al. Statistical multi-source translation , 2001, MTSUMMIT.

[15] Georges Quénot,et al. Coupled Ensembles of Neural Networks , 2017, 2018 International Conference on Content-Based Multimedia Indexing (CBMI).

[16] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[17] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[18] Hermann Ney,et al. Speech translation: coupling of recognition and translation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[19] Olivier Pietquin,et al. Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation , 2016, NIPS 2016.

[20] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21] Kevin Duh,et al. DyNet: The Dynamic Neural Network Toolkit , 2017, ArXiv.

[22] David Chiang,et al. Tied Multitask Learning for Neural Speech Translation , 2018, NAACL.

[23] Graham Neubig,et al. Learning a Translation Model from Word Lattices , 2016, INTERSPEECH.

[24] Martine Adda-Decker,et al. Parallel Speech Collection for Under-resourced Language Studies Using the Lig-Aikuma Mobile Device App , 2016, SLTU.

[25] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[26] Matt Post,et al. Improved speech-to-text translation with the Fisher and Callhome Spanish-English speech translation corpus , 2013, IWSLT.

[27] Olivier Pietquin,et al. End-to-End Automatic Speech Translation of Audiobooks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28] Sebastian Stüker,et al. Breaking the Unwritten Language Barrier: The BULB Project , 2016, SLTU.