论文信息 - Lightweight Adapter Tuning for Multilingual Speech Translation

Lightweight Adapter Tuning for Multilingual Speech Translation

Adapter modules were recently introduced as an efficient alternative to fine-tuning in NLP. Adapter tuning consists in freezing pretrained parameters of a model and injecting lightweight modules between layers, resulting in the addition of only a small number of taskspecific trainable parameters. While adapter tuning was investigated for multilingual neural machine translation, this paper proposes a comprehensive analysis of adapters for multilingual speech translation (ST). Starting from different pre-trained models (a multilingual ST trained on parallel data or a multilingual BART (mBART) trained on non-parallel multilingual data), we show that adapters can be used to: (a) efficiently specialize ST to specific language pairs with a low extra cost in terms of parameters, and (b) transfer from an automatic speech recognition (ASR) task and an mBART pretrained model to a multilingual ST task. Experiments show that adapter tuning offer competitive results to full fine-tuning, while being much more parameter-efficient.

[2] Bhuvana Ramabhadran,et al. Multilingual Speech Recognition with Self-Attention Structured Parameterization , 2020, INTERSPEECH.

[3] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[4] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[6] Ankur Bapna,et al. Simple, Scalable Adaptation for Neural Machine Translation , 2019, EMNLP.

[7] Kevin Duh,et al. ESPnet-ST: All-in-One Speech Translation Toolkit , 2020, ACL.

[8] Adam Lopez,et al. Pre-training on high-resource speech recognition improves low-resource speech-to-text translation , 2018, NAACL.

[9] Iryna Gurevych,et al. MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer , 2020, EMNLP.

[10] Andrea Vedaldi,et al. Learning multiple visual domains with residual adapters , 2017, NIPS.

[11] Yun Tang,et al. Multilingual Speech Translation with Efficient Finetuning of Pretrained Models. , 2020 .

[12] Guangsen Wang,et al. Adapt-and-Adjust: Overcoming the Long-Tail Problem of Multilingual Speech Recognition , 2020, Interspeech.

[13] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[14] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[15] Iryna Gurevych,et al. AdapterHub: A Framework for Adapting Transformers , 2020, EMNLP.

[16] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.

[17] Marjan Ghazvininejad,et al. Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation , 2020, ArXiv.

[18] Tara N. Sainath,et al. Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model , 2019, INTERSPEECH.

[19] Ho-Gyeong Kim,et al. Adaptable Multi-Domain Language Model for Transformer ASR , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Philipp Koehn,et al. Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[21] Dmytro Okhonko,et al. fairseq S2T: Fast Speech-to-Text Modeling with fairseq , 2020, AACL.

[22] Matthias Gallé,et al. Language Adapters for Zero Shot Neural Machine Translation , 2020, EMNLP.

[23] Kunihiko Fukushima,et al. Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Visual Pattern Recognition , 1982 .

[24] Carlos Escolano,et al. End-to-End Speech Translation with Pre-trained Models and Adapters: UPC at IWSLT 2021 , 2021, IWSLT.

[25] Yuqing Tang,et al. Multilingual Translation with Extensible Multilingual Pretraining and Finetuning , 2020, ArXiv.

[26] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[27] Mattia Antonino Di Gangi,et al. MuST-C: a Multilingual Speech Translation Corpus , 2019, NAACL.

[28] Alexei Baevski,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.

[29] Marjan Ghazvininejad,et al. Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[30] Iryna Gurevych,et al. AdapterFusion: Non-Destructive Task Composition for Transfer Learning , 2021, EACL.

[31] Iain Murray,et al. BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning , 2019, ICML.

[32] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[33] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.