Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation
暂无分享,去创建一个
Junkun Chen | Renjie Zheng | Mingbo Ma | Liang Huang | Renjie Zheng | Liang Huang | Mingbo Ma | Junkun Chen
[1] Alexei Baevski,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[2] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[3] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[4] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[5] Jiajun Zhang,et al. Bridging the Modality Gap for Speech-to-Text Translation , 2020, ArXiv.
[6] Mingxuan Wang,et al. Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation , 2021, AAAI.
[7] Qiantong Xu,et al. Self-Training for End-to-End Speech Translation , 2020, INTERSPEECH.
[8] Sebastian Ruder,et al. Fine-tuned Language Models for Text Classification , 2018, ArXiv.
[9] Mattia Antonino Di Gangi,et al. MuST-C: a Multilingual Speech Translation Corpus , 2019, NAACL.
[10] Shang-Wen Li,et al. TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[11] Kenneth Ward Church,et al. Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training , 2020, FINDINGS.
[12] Junkun Chen,et al. Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR , 2021, FINDINGS.
[13] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[14] Kevin Duh,et al. ESPnet-ST: All-in-One Speech Translation Toolkit , 2020, ACL.
[15] Liang Huang,et al. MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation , 2020, ArXiv.
[16] Matthias Sperber,et al. Neural Lattice-to-Sequence Models for Uncertain Inputs , 2017, EMNLP.
[17] Yu Sun,et al. ERNIE: Enhanced Representation through Knowledge Integration , 2019, ArXiv.
[18] Armand Joulin,et al. Libri-Light: A Benchmark for ASR with Limited or No Supervision , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[20] Lin-Shan Lee,et al. SpeechBERT: An Audio-and-Text Jointly Learned Language Model for End-to-End Spoken Question Answering , 2019, INTERSPEECH.
[21] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[22] Yuan Cao,et al. Leveraging Weakly Supervised Data to Improve End-to-end Speech-to-text Translation , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[24] Olivier Pietquin,et al. End-to-End Automatic Speech Translation of Audiobooks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Ali Can Kocabiyikoglu,et al. Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation , 2018, LREC.
[26] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Jiajun Zhang,et al. End-to-End Speech Translation with Knowledge Distillation , 2019, INTERSPEECH.
[28] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.
[29] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[30] Navdeep Jaitly,et al. Sequence-to-Sequence Models Can Directly Translate Foreign Speech , 2017, INTERSPEECH.
[31] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[32] Wilson L. Taylor,et al. “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .
[33] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[34] Mingxuan Wang,et al. Consecutive Decoding for Speech-to-text Translation , 2021, AAAI.