Finding recurrent out-of-vocabulary words

Out-of-vocabulary (OOV) words can appear more than once in a conversation or over a period of time. Such multiple instances of the same OOV word provide valuable information for estimating the pronunciation or the part-of-speech (POS) tag of the word. But in a conventional OOV word detection system, each OOV word is recognized and treated individually. We therefore investigated how to identify recurrent OOV words in speech recognition. Specifically, we propose to cluster multiple instances of the same OOV word using a bottom-up approach. Phonetic, acoustic and contextual features were collected to measure the distance between OOV candidates. The experimental results show that the bottom-up clustering approach is very effective at detecting the recurrence of OOV words. We also found that the phonetic feature is better than the acoustic and contextual features, and the best performance is achieved when combining all features.

[1]  Bhuvana Ramabhadran,et al.  Towards using hybrid word and fragment units for vocabulary independent LVCSR systems , 2009, INTERSPEECH.

[2]  J. Ajmera,et al.  Phonetic Distance Measures for Speech Recognition Vocabulary and Grammar Optimization , 2007 .

[3]  Owen Kimball,et al.  Subword speech recognition for detection of unseen words , 2012, INTERSPEECH.

[4]  M. Cugmas,et al.  On comparing partitions , 2015 .

[5]  Hermann Ney,et al.  Hybrid Language Models Using Mixed Types of Sub-Lexical Units for Open Vocabulary German LVCSR , 2011, INTERSPEECH.

[6]  Jithendra Vepa,et al.  Using posterior-based features in template matching for speech recognition , 2006, INTERSPEECH.

[7]  Peder A. Olsen,et al.  Theory and practice of acoustic confusability , 2002, Comput. Speech Lang..

[8]  Thomas Schaaf Detection of OOV words using generalized word models and a semantic class language model , 2001, INTERSPEECH.

[9]  Alexander I. Rudnicky,et al.  OOV Word Detection using Hybrid Models with Mixed Types of Fragments , 2012, INTERSPEECH.

[10]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[11]  Alexander I. Rudnicky,et al.  OOV Detection and Recovery Using Hybrid Models with Different Fragments , 2011, INTERSPEECH.

[12]  Hermann Ney,et al.  Open vocabulary speech recognition with flat hybrid models , 2005, INTERSPEECH.

[13]  Ashish Verma,et al.  Keyword Search using Modified Minimum Edit Distance Measure , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[14]  James R. Glass,et al.  Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[15]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[16]  F TichyWalter The string-to-string correction problem with block moves , 1984 .

[17]  Timothy J. Hazen,et al.  Query-by-example spoken term detection using phonetic posteriorgram templates , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[18]  Alexander I. Rudnicky,et al.  System combination for out-of-vocabulary word detection , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Mark Dredze,et al.  Contextual Information Improves OOV Detection in Speech , 2010, NAACL.

[20]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[21]  T. K. Vintsyuk Speech discrimination by dynamic programming , 1968 .

[22]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[23]  Dietrich Klakow,et al.  OOV-detection in large vocabulary system using automatically defined word-fragments as fillers , 1999, EUROSPEECH.

[24]  Lucian Galescu Recognition of out-of-vocabulary words with sub-lexical language models , 2003, INTERSPEECH.

[25]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[26]  James Glass,et al.  Modelling out-of-vocabulary words for robust speech recognition , 2002 .

[27]  James R. Glass,et al.  Modeling out-of-vocabulary words for robust speech recognition , 2000, INTERSPEECH.