Seed Words Based Data Selection for Language Model Adaptation

We address the problem of language model customization in applications where the ASR component needs to manage domain-specific terminology; although current state-of-the-art speech recognition technology provides excellent results for generic domains, the adaptation to specialized dictionaries or glossaries is still an open issue. In this work we present an approach for automatically selecting sentences, from a text corpus, that match, both semantically and morphologically, a glossary of terms (words or composite words) furnished by the user. The final goal is to rapidly adapt the language model of an hybrid ASR system with a limited amount of in-domain text data in order to successfully cope with the linguistic domain at hand; the vocabulary of the baseline model is expanded and tailored, reducing the resulting OOV rate. Data selection strategies based on shallow morphological seeds and semantic similarity via word2vec are introduced and discussed; the experimental setting consists in a simultaneous interpreting scenario, where ASRs in three languages are designed to recognize the domainspecific terms (i.e. dentistry). Results using different metrics (OOV rate, WER, precision and recall) show the effectiveness of the proposed techniques.

[1]  Yiming Wang,et al.  Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.

[2]  Fabio Brugnara,et al.  From broadcast news to spontaneous dialogue transcription: portability issues , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  Jürgen Schmidhuber,et al.  An Application of Recurrent Neural Networks to Discriminative Keyword Spotting , 2007, ICANN.

[4]  Andreas Dengel,et al.  Advanced Similarity Measures Using Word Embeddings and Siamese Networks in CBR , 2019, IntelliSys.

[5]  Wei Xie,et al.  Crnn-Ctc Based Mandarin Keywords Spotting , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Óscar Corcho,et al.  SmarTerp: A CAI System to Support Simultaneous Interpreters in Real-Time , 2021, TRITON.

[7]  Georg Heigold,et al.  Small-footprint keyword spotting using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Tara N. Sainath,et al.  Convolutional neural networks for small-footprint keyword spotting , 2015, INTERSPEECH.

[9]  Francis M. Tyers,et al.  Common Voice: A Massively-Multilingual Speech Corpus , 2020, LREC.

[10]  Vijaymeena M.K,et al.  A Survey on Similarity Measures in Text Mining , 2016 .

[11]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[12]  Raymond Chi-Wing Wong,et al.  Chameleon: A Language Model Adaptation Toolkit for Automatic Speech Recognition of Conversational Speech , 2019, EMNLP/IJCNLP.

[13]  Mitchel Weintraub,et al.  LVCSR log-likelihood ratio scoring for keyword spotting , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[14]  Miguel Tairum Cruz,et al.  Keyword Transformer: A Self-Attention Model for Keyword Spotting , 2021, Interspeech 2021.

[15]  Bart Desmet,et al.  Simultaneous interpretation of numbers and the impact of technological support , 2018 .

[16]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[17]  Roberto Gretter,et al.  Euronews: a multilingual speech corpus for ASR , 2014, LREC.

[18]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[19]  Steve Renals,et al.  Adaptation Algorithms for Speech Recognition: An Overview , 2020, ArXiv.

[20]  Richard Rose,et al.  A hidden Markov model based keyword recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.