论文信息 - Finnish ASR with Deep Transformer Models

Finnish ASR with Deep Transformer Models

Recently, BERT and Transformer-XL based architectures have achieved strong results in a range of NLP applications. In this paper, we explore Transformer architectures—BERT and Transformer-XL—as a language model for a Finnish ASR task with different rescoring schemes. We achieve strong results in both an intrinsic and an extrinsic task with Transformer-XL. Achieving 29% better perplexity and 3% better WER than our previous best LSTM-based approach. We also introduce a novel three-pass decoding scheme which improves the ASR performance by 8%. To the best of our knowledge, this is also the first work (i) to formulate an alpha smoothing framework to use the non-autoregressive BERT language model for an ASR task, and (ii) to explore sub-word units with Transformer-XL for an agglutinative language like Finnish.

[1] George Saon,et al. The IBM 2016 English Conversational Telephone Speech Recognition System , 2016, INTERSPEECH.

[2] Hermann Ney,et al. RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation , 2019, INTERSPEECH.

[3] Peter Smit. Modern subword-based models for automatic speech recognition , 2019 .

[4] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[5] Mathias Creutz,et al. Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.

[6] Mikko Kurimo,et al. First-Pass Techniques for Very Large Vocabulary Speech Recognition ff Morphologically Rich Languages , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[7] Mikko Kurimo,et al. Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline , 2013 .

[8] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9] Mathias Creutz,et al. Unsupervised Discovery of Morphemes , 2002, SIGMORPHON.

[10] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[11] Kyu J. Han,et al. The CAPIO 2017 Conversational Speech Recognition System , 2017, ArXiv.

[12] Tapio Salakoski,et al. Multilingual is not enough: BERT for Finnish , 2019, ArXiv.

[13] Kyomin Jung,et al. Effective Sentence Scoring Method Using BERT for Speech Recognition , 2019, ACML.

[14] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[15] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .

[16] Mark J. F. Gales,et al. Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition , 2017, INTERSPEECH.

[17] Boris Ginsburg,et al. Jasper: An End-to-End Convolutional Neural Acoustic Model , 2019, INTERSPEECH.

[18] Yongqiang Wang,et al. Efficient lattice rescoring using recurrent neural network language models , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[20] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[21] Bhuvana Ramabhadran,et al. Language modeling with highway LSTM , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[22] Aleksei Romanenko,et al. The STC ASR System for the VOiCES from a Distance Challenge 2019 , 2019, INTERSPEECH.

[23] Mikko Kurimo,et al. Automatic Construction of the Finnish Parliament Speech Corpus , 2017, INTERSPEECH.

[24] Hermann Ney,et al. Language Modeling with Deep Transformers , 2019, INTERSPEECH.

[25] Geoffrey Zweig,et al. Transformer-Based Acoustic Modeling for Hybrid Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[27] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[28] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.