Dialectal Speech Recognition and Translation of Swiss German Speech to Standard German Text: Microsoft's Submission to SwissText 2021 (short paper)

This paper describes the winning approach in the Shared Task 3 at SwissText 2021 on Swiss German Speech to Standard German Text, a public competition on dialect recognition and translation. Swiss German refers to the multitude of Alemannic dialects spoken in the German-speaking parts of Switzerland. Swiss German differs significantly from standard German in pronunciation, word inventory and grammar. It is mostly incomprehensible to native German speakers. Moreover, it lacks a standardized written script. To solve the challenging task, we propose a hybrid automatic speech recognition system with a lexicon that incorporates translations, a 1st pass language model that deals with Swiss German particularities, a transfer-learned acoustic model and a strong neural language model for 2nd pass rescoring. Our submission reaches 46.04% BLEU on a blind conversational test set and outperforms the second best competitor by a 12%

[1]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[2]  Kai Chen,et al.  Training Deep Bidirectional LSTM Acoustic Model for LVCSR by a Context-Sensitive-Chunk BPTT Approach , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Tannon Kew,et al.  ASR for Non-standardised Languages with Dialectal Variation: the case of Swiss German , 2020, VARDIAL.

[5]  Kenneth Ward Church,et al.  A Fast Re-scoring Strategy to Capture Long-Distance Dependencies , 2011, EMNLP.

[6]  Hervé Bourlard,et al.  A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Philip N. Garner,et al.  Automatic speech recognition and translation of a Swiss German dialect: Walliserdeutsch , 2014, INTERSPEECH.

[8]  Shinji Watanabe,et al.  Arabic Speech Recognition by End-to-End, Modular Systems and Human , 2021, Comput. Speech Lang..

[9]  Nora Hollenstein,et al.  Compilation of a Swiss German Dialect Corpus and its Application to PoS Tagging , 2014, VarDial@COLING.

[10]  Geoffrey Zweig,et al.  Achieving Human Parity in Conversational Speech Recognition , 2016, ArXiv.

[11]  Boris Ginsburg,et al.  Jasper: An End-to-End Convolutional Neural Acoustic Model , 2019, INTERSPEECH.

[12]  Claudiu Musat,et al.  A Swiss German Dictionary: Variation in Speech and Writing , 2020, LREC.

[13]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[14]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[15]  Tanja Samardzic,et al.  UZH TILT: A Kaldi recipe for Swiss German Speech to Standard German Text , 2020, SwissText/KONVENS.

[16]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[17]  Xiaohui Zhang,et al.  Acoustic Data-Driven Lexicon Learning Based on a Greedy Pronunciation Selection Framework , 2017, INTERSPEECH.

[18]  Adaptation and Training of a Swiss German Speech Recognition System using Data-driven Pronunciation Modelling , 2018 .

[19]  Manfred Vogel,et al.  Swiss Parliaments Corpus, an Automatically Aligned Swiss German Speech to Standard German Text Corpus , 2021, SwissText.

[20]  Torsten Zesch,et al.  LTL-UDE at Low-Resource Speech-to-Text Shared Task: Investigating Mozilla DeepSpeech in a low-resource setting , 2020, SwissText/KONVENS.

[21]  Ahmed Mohamed Abdel Maksoud Ali,et al.  Multi-dialect Arabic broadcast speech recognition , 2018 .

[22]  Fernando Benites,et al.  ZHAW-InIT at GermEval 2020 Task 4: Low-Resource Speech-to-Text , 2020, SwissText/KONVENS.