Out-of-Vocabulary Word Recovery using FST-Based Subword Unit Clustering in a Hybrid ASR System

The paper presents a new approach to extracting useful information from out-of-vocabulary (OOV) speech regions in ASR system output. The system makes use of a hybrid decoding network with both words and sub-word units. In the decoded lattices, candidates for OOV regions are identified as sub-graphs of sub-word units. To facilitate OOV word recovery, we search for recurring OOV s by clustering the detected candidate OOV s. The metrics for clustering is based on a comparison of the sub-graphs corresponding to the OOV candidates. The proposed method discovers repeating out-of-vocabulary words and finds their graphemic representation more robustly than more conventional techniques taking into account only one best sub-word string hypotheses.

[1]  Mehryar Mohri,et al.  Speech Recognition with Weighted Finite-State Transducers , 2008 .

[2]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[4]  Hynek Hermansky,et al.  Recovery of Rare Words in Lecture Speech , 2010, TSD.

[5]  Murat Saraclar,et al.  Lattice Indexing for Spoken Term Detection , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Mark J. Embrechts,et al.  On the Use of the Adjusted Rand Index as a Metric for Evaluating Supervised Classification , 2009, ICANN.

[7]  Slav Petrov,et al.  Syntactic Annotations for the Google Books NGram Corpus , 2012, ACL.

[8]  Alexander I. Rudnicky,et al.  Learning better lexical properties for recurrent OOV words , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[9]  Lukás Burget,et al.  Similarity scoring for recognizing repeated out-of-vocabulary words , 2010, INTERSPEECH.

[10]  Bhuvana Ramabhadran,et al.  A new method for OOV detection using hybrid word/fragment system , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Alexander I. Rudnicky,et al.  Finding recurrent out-of-vocabulary words , 2013, INTERSPEECH.

[12]  Richard M. Schwartz,et al.  Subword and phonetic search for detecting out-of-vocabulary keywords , 2014, INTERSPEECH.

[13]  Richard M. Schwartz,et al.  Combination of search techniques for improved spotting of OOV keywords , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[15]  Murat Saraclar,et al.  Hybrid language models for out of vocabulary word detection in large vocabulary conversational speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Richard M. Schwartz,et al.  Semi-Supervised Methods for Improving Keyword Search of Unseen Terms , 2012, INTERSPEECH.

[17]  Jean-Luc Gauvain,et al.  Acoustic unit discovery and pronunciation generation from a grapheme-based lexicon , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[18]  Hermann Ney,et al.  Open vocabulary speech recognition with flat hybrid models , 2005, INTERSPEECH.

[19]  James R. Glass,et al.  Spoken Content Retrieval—Beyond Cascading Speech Recognition with Text Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Igor Szöke Hybrid word-subword spoken term detection , 2010 .