Adaptation of Morph-Based Speech Recognition for Foreign Names and Acronyms

In this paper, we improve morph-based speech recognition system by focusing adaptation efforts on acronyms (ACRs) and foreign proper names (FPNs). An unsupervised language model (LM) adaptation framework based on two-pass decoding is used. Vocabulary adaptation is applied alongside unsupervised LM adaptation. The aim is to improve both language and pronunciation modeling for FPNs and ACRs. A smart selection algorithm is used to find the most likely topically related foreign words and acronyms from in-domain text. New pronunciation rules are generated for the selected words. Different kinds of morpheme adaptation operations are also evaluated on the ACR and FPN candidate words, to ensure optimal results are gained from pronunciation adaptation. Statistically significant improvements in average word error rate (WER), and term error rate (TER), are achieved using a combination of unsupervised LM adaptation with vocabulary adaptation focused on ACRs and FPNs.

[1]  Bhuvana Ramabhadran,et al.  Towards using hybrid word and fragment units for vocabulary independent LVCSR systems , 2009, INTERSPEECH.

[2]  Mathias Creutz,et al.  Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0 , 2005 .

[3]  Richard Sproat,et al.  Applications of maximum entropy rankers to problems in spoken language processing , 2014, INTERSPEECH.

[4]  Sung-Hyuk Cha,et al.  Detection of Foreign Entities in Native Text Using N-gram Based Cumulative Frequency Addition , 2005 .

[5]  Georges Linarès,et al.  On-demand new word learning using world wide web , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Mikko Kurimo,et al.  Unsupervised topic adaptation for morph-based speech recognition , 2013, INTERSPEECH.

[7]  Gang Li,et al.  Vocabulary and language model adaptation using just one speech file , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Jan Svec,et al.  Improving Speech Recognition by Detecting Foreign Inclusions and Generating Pronunciations , 2013, TSD.

[9]  Peter Bell,et al.  Description of the UEDIN System for German ASR , 2013 .

[10]  Mark Dredze,et al.  Learning Sub-Word Units for Open Vocabulary Speech Recognition , 2011, ACL.

[11]  Benoit Maison,et al.  Pronunciation modeling for names of foreign origin , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[12]  Mikko Kurimo,et al.  Analysing Recognition Errors in Unlimited-Vocabulary Speech Recognition , 2009, HLT-NAACL.

[13]  Youssef Bassil,et al.  ASR Context-Sensitive Error Correction Based on Microsoft N-Gram Dataset , 2012, ArXiv.

[14]  Mikko Kurimo,et al.  Unsupervised Vocabulary Adaptation for Morph-based Language Models , 2012, WLM@NAACL-HLT.

[15]  Hermann Ney,et al.  Open vocabulary speech recognition with flat hybrid models , 2005, INTERSPEECH.

[16]  Teemu Hirsimäki,et al.  On Growing and Pruning Kneser–Ney Smoothed $ N$-Gram Models , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Alexandre Allauzen,et al.  Open vocabulary ASR for audiovisual document indexation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[18]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[19]  Alexandre Allauzen,et al.  Diachronic vocabulary adaptation for broadcast news transcription , 2005, INTERSPEECH.

[20]  Mikko Kurimo,et al.  Importance of High-Order N-Gram Models in Morph-Based Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.