Learning better lexical properties for recurrent OOV words

Out-of-vocabulary (OOV) words can appear more than once in a conversation or over a period of time. Such multiple instances of the same OOV word provide valuable information for learning the lexical properties of the word. Therefore, we investigated how to estimate better pronunciation, spelling and part-of-speech (POS) label for recurrent OOV words. We first identified recurrent OOV words from the output of a hybrid decoder by applying a bottom-up clustering approach. Then, multiple instances of the same OOV word were used simultaneously to learn properties of the OOV word. The experimental results showed that the bottom-up clustering approach is very effective at detecting the recurrence of OOV words. Furthermore, by using evidence from multiple instances of the same word, the pronunciation accuracy, recovery rate and POS label accuracy of recurrent OOV words can be substantially improved.

[1]  T. K. Vintsyuk Speech discrimination by dynamic programming , 1968 .

[2]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[3]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[4]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[5]  L. Hubert,et al.  Comparing partitions , 1985 .

[6]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[7]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[8]  Dietrich Klakow,et al.  OOV-detection in large vocabulary system using automatically defined word-fragments as fillers , 1999, EUROSPEECH.

[9]  James R. Glass,et al.  Modeling out-of-vocabulary words for robust speech recognition , 2000, INTERSPEECH.

[10]  Thomas Schaaf Detection of OOV words using generalized word models and a semantic class language model , 2001, INTERSPEECH.

[11]  Peder A. Olsen,et al.  Theory and practice of acoustic confusability , 2002, Comput. Speech Lang..

[12]  James Glass,et al.  Modelling out-of-vocabulary words for robust speech recognition , 2002 .

[13]  Lucian Galescu Recognition of out-of-vocabulary words with sub-lexical language models , 2003, INTERSPEECH.

[14]  Stanley F. Chen,et al.  Conditional and joint models for grapheme-to-phoneme conversion , 2003, INTERSPEECH.

[15]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[16]  Hermann Ney,et al.  Open vocabulary speech recognition with flat hybrid models , 2005, INTERSPEECH.

[17]  Jithendra Vepa,et al.  Using posterior-based features in template matching for speech recognition , 2006, INTERSPEECH.

[18]  J. Ajmera,et al.  Phonetic Distance Measures for Speech Recognition Vocabulary and Grammar Optimization , 2007 .

[19]  Ashish Verma,et al.  Keyword Search using Modified Minimum Edit Distance Measure , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[20]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[21]  James R. Glass,et al.  Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[22]  Bhuvana Ramabhadran,et al.  Towards using hybrid word and fragment units for vocabulary independent LVCSR systems , 2009, INTERSPEECH.

[23]  Timothy J. Hazen,et al.  Query-by-example spoken term detection using phonetic posteriorgram templates , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[24]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[25]  Mark Dredze,et al.  Contextual Information Improves OOV Detection in Speech , 2010, NAACL.

[26]  Alexander I. Rudnicky,et al.  OOV Detection and Recovery Using Hybrid Models with Different Fragments , 2011, INTERSPEECH.

[27]  Hermann Ney,et al.  Hybrid Language Models Using Mixed Types of Sub-Lexical Units for Open Vocabulary German LVCSR , 2011, INTERSPEECH.

[28]  Owen Kimball,et al.  Subword speech recognition for detection of unseen words , 2012, INTERSPEECH.

[29]  Alexander I. Rudnicky,et al.  System combination for out-of-vocabulary word detection , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Alexander I. Rudnicky,et al.  OOV Word Detection using Hybrid Models with Mixed Types of Fragments , 2012, INTERSPEECH.

[31]  Alexander I. Rudnicky,et al.  Finding recurrent out-of-vocabulary words , 2013, INTERSPEECH.