论文信息 - Dialectal Speech Recognition and Translation of Swiss German Speech to Standard German Text: Microsoft's Submission to SwissText 2021 (short paper)

Dialectal Speech Recognition and Translation of Swiss German Speech to Standard German Text: Microsoft's Submission to SwissText 2021 (short paper)

This paper describes the winning approach in the Shared Task 3 at SwissText 2021 on Swiss German Speech to Standard German Text, a public competition on dialect recognition and translation. Swiss German refers to the multitude of Alemannic dialects spoken in the German-speaking parts of Switzerland. Swiss German differs significantly from standard German in pronunciation, word inventory and grammar. It is mostly incomprehensible to native German speakers. Moreover, it lacks a standardized written script. To solve the challenging task, we propose a hybrid automatic speech recognition system with a lexicon that incorporates translations, a 1st pass language model that deals with Swiss German particularities, a transfer-learned acoustic model and a strong neural language model for 2nd pass rescoring. Our submission reaches 46.04% BLEU on a blind conversational test set and outperforms the second best competitor by a 12%

[1] Hermann Ney,et al. LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[2] Kai Chen,et al. Training Deep Bidirectional LSTM Acoustic Model for LVCSR by a Context-Sensitive-Chunk BPTT Approach , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4] Tannon Kew,et al. ASR for Non-standardised Languages with Dialectal Variation: the case of Swiss German , 2020, VARDIAL.

[5] Kenneth Ward Church,et al. A Fast Re-scoring Strategy to Capture Long-Distance Dependencies , 2011, EMNLP.

[6] Hervé Bourlard,et al. A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7] Philip N. Garner,et al. Automatic speech recognition and translation of a Swiss German dialect: Walliserdeutsch , 2014, INTERSPEECH.

[8] Shinji Watanabe,et al. Arabic Speech Recognition by End-to-End, Modular Systems and Human , 2021, Comput. Speech Lang..

[9] Nora Hollenstein,et al. Compilation of a Swiss German Dialect Corpus and its Application to PoS Tagging , 2014, VarDial@COLING.

[10] Geoffrey Zweig,et al. Achieving Human Parity in Conversational Speech Recognition , 2016, ArXiv.

[11] Boris Ginsburg,et al. Jasper: An End-to-End Convolutional Neural Acoustic Model , 2019, INTERSPEECH.

[12] Claudiu Musat,et al. A Swiss German Dictionary: Variation in Speech and Writing , 2020, LREC.

[13] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.

[14] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[15] Tanja Samardzic,et al. UZH TILT: A Kaldi recipe for Swiss German Speech to Standard German Text , 2020, SwissText/KONVENS.