Unsupervised and User Feedback Based Lexicon Adaptation for Foreign Names and Acronyms

In this work we evaluate a set of lexicon adaptation methods for improving the recognition of foreign names and acronyms in automatic speech recognition ASR. The most likely foreign names and acronyms are selected from the LM training corpus based on typographic information and letter-ngram perplexity. Adapted pronunciation rules are generated for the selected foreign name candidates using a statistical grapheme-to-phoneme G2P model. A rule-based method is used for pronunciation adaptation of acronym candidates. In addition to unsupervised lexicon adaptation, we also evaluate an adaptation method based on speech data and user corrected ASR transcripts. Pronunciation variants for foreign name candidates are retrieved using forced alignment and second-pass decoding over partial audio segments. Optimal pronunciation variants are collected and used for future pronunciation adaptation of foreign names.

[1]  Mikko Kurimo,et al.  Unsupervised topic adaptation for morph-based speech recognition , 2013, INTERSPEECH.

[2]  Teemu Hirsimäki,et al.  On Growing and Pruning Kneser–Ney Smoothed $ N$-Gram Models , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Jean-Pierre Martens,et al.  Pronunciation-based ASR for names , 2009, INTERSPEECH.

[4]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[5]  Mikko Kurimo,et al.  Adaptation of Morph-Based Speech Recognition for Foreign Names and Acronyms , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Jan Svec,et al.  Improving Speech Recognition by Detecting Foreign Inclusions and Generating Pronunciations , 2013, TSD.

[7]  Qian Yang,et al.  Development of a phoneme-to-phoneme (p2p) converter to improve the grapheme-to-phoneme (g2p) conversion of names , 2006, LREC.

[8]  Torbjørn Svendsen,et al.  Pronunciation variation modeling of non-native proper names by discriminative tree search , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Mikko Kurimo,et al.  Importance of High-Order N-Gram Models in Morph-Based Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Sung-Hyuk Cha,et al.  Detection of Foreign Entities in Native Text Using N-gram Based Cumulative Frequency Addition , 2005 .

[11]  Peter Bell,et al.  Description of the UEDIN System for German ASR , 2013 .

[12]  Benoit Maison,et al.  Pronunciation modeling for names of foreign origin , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).