Lightweight Adapter Tuning for Multilingual Speech Translation

Adapter modules were recently introduced as an efficient alternative to fine-tuning in NLP. Adapter tuning consists in freezing pretrained parameters of a model and injecting lightweight modules between layers, resulting in the addition of only a small number of taskspecific trainable parameters. While adapter tuning was investigated for multilingual neural machine translation, this paper proposes a comprehensive analysis of adapters for multilingual speech translation (ST). Starting from different pre-trained models (a multilingual ST trained on parallel data or a multilingual BART (mBART) trained on non-parallel multilingual data), we show that adapters can be used to: (a) efficiently specialize ST to specific language pairs with a low extra cost in terms of parameters, and (b) transfer from an automatic speech recognition (ASR) task and an mBART pretrained model to a multilingual ST task. Experiments show that adapter tuning offer competitive results to full fine-tuning, while being much more parameter-efficient.

[2]  Bhuvana Ramabhadran,et al.  Multilingual Speech Recognition with Self-Attention Structured Parameterization , 2020, INTERSPEECH.

[3]  Mona Attariyan,et al.  Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[6]  Ankur Bapna,et al.  Simple, Scalable Adaptation for Neural Machine Translation , 2019, EMNLP.

[7]  Kevin Duh,et al.  ESPnet-ST: All-in-One Speech Translation Toolkit , 2020, ACL.

[8]  Adam Lopez,et al.  Pre-training on high-resource speech recognition improves low-resource speech-to-text translation , 2018, NAACL.

[9]  Iryna Gurevych,et al.  MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer , 2020, EMNLP.

[10]  Andrea Vedaldi,et al.  Learning multiple visual domains with residual adapters , 2017, NIPS.

[11]  Yun Tang,et al.  Multilingual Speech Translation with Efficient Finetuning of Pretrained Models. , 2020 .

[12]  Guangsen Wang,et al.  Adapt-and-Adjust: Overcoming the Long-Tail Problem of Multilingual Speech Recognition , 2020, Interspeech.

[13]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[14]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[15]  Iryna Gurevych,et al.  AdapterHub: A Framework for Adapting Transformers , 2020, EMNLP.

[16]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[17]  Marjan Ghazvininejad,et al.  Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation , 2020, ArXiv.

[18]  Tara N. Sainath,et al.  Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model , 2019, INTERSPEECH.

[19]  Ho-Gyeong Kim,et al.  Adaptable Multi-Domain Language Model for Transformer ASR , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[21]  Dmytro Okhonko,et al.  fairseq S2T: Fast Speech-to-Text Modeling with fairseq , 2020, AACL.

[22]  Matthias Gallé,et al.  Language Adapters for Zero Shot Neural Machine Translation , 2020, EMNLP.

[23]  Kunihiko Fukushima,et al.  Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Visual Pattern Recognition , 1982 .

[24]  Carlos Escolano,et al.  End-to-End Speech Translation with Pre-trained Models and Adapters: UPC at IWSLT 2021 , 2021, IWSLT.

[25]  Yuqing Tang,et al.  Multilingual Translation with Extensible Multilingual Pretraining and Finetuning , 2020, ArXiv.

[26]  Quoc V. Le,et al.  SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[27]  Mattia Antonino Di Gangi,et al.  MuST-C: a Multilingual Speech Translation Corpus , 2019, NAACL.

[28]  Alexei Baevski,et al.  wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.

[29]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[30]  Iryna Gurevych,et al.  AdapterFusion: Non-Destructive Task Composition for Transfer Learning , 2021, EACL.

[31]  Iain Murray,et al.  BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning , 2019, ICML.

[32]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[33]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.