Subword-based modeling for handling OOV words inkeyword spotting

This work compares ASR decoding at different subword levels crossed with alternative keyword search strategies to handle the OOV issue for keyword spotting in the low-resource setting. We show that a morpheme-based subword modeling approach is effective in recovering OOV keywords within a Turkish low-resource keyword spotting task, where mixed word and morpheme decoding approach outperforms the traditional subword-based search from word-decoded lattices that are broken down to subword lattices. Furthermore, unsupervised learning of morphology works almost as well as a rule-based system designed for the language despite the low-resource condition. A staged keyword search strategy benefits from both methods of morphological analysis.

[1]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[2]  Herbert Gish,et al.  Rapid and accurate spoken term detection , 2007, INTERSPEECH.

[3]  Andreas Stolcke,et al.  Morphology-based language modeling for conversational Arabic speech recognition , 2006, Comput. Speech Lang..

[4]  Murat Saraclar,et al.  Morphology-based and sub-word language modeling for Turkish speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Çağrı Çöltekin,et al.  A Freely Available Morphological Analyzer for Turkish , 2010, LREC.

[6]  Richard Sproat,et al.  Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.

[7]  Hermann Ney,et al.  Investigating the use of morphological decomposition and diacritization for improving Arabic LVCSR , 2009, INTERSPEECH.

[8]  Mikko Kurimo,et al.  Importance of High-Order N-Gram Models in Morph-Based Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Tanja Schultz,et al.  Turkish LVCSR: towards better speech recognition for agglutinative languages , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  Philip C. Woodland,et al.  Particle-based language modelling , 2000, INTERSPEECH.

[11]  Nelson Morgan,et al.  The TAO of ATWV: Probing the mysteries of keyword search performance , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[12]  Keikichi Hirose,et al.  WFST-Based Grapheme-to-Phoneme Conversion: Open Source tools for Alignment, Model-Building and Decoding , 2012, FSMNLP.

[13]  Hermann Ney,et al.  Investigations on the use of morpheme level features in Language Models for Arabic LVCSR , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Mathias Creutz,et al.  INDUCING THE MORPHOLOGICAL LEXICON OF A NATURAL LANGUAGE FROM UNANNOTATED TEXT , 2005 .

[15]  Hermann Ney,et al.  Sub-lexical language models for German LVCSR , 2010, 2010 IEEE Spoken Language Technology Workshop.

[16]  Siddika Parlak,et al.  Performance Analysis and Improvement of Turkish Broadcast News Retrieval , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Owen Kimball,et al.  Subword speech recognition for detection of unseen words , 2012, INTERSPEECH.

[18]  Hermann Ney,et al.  Hierarchical hybrid language models for open vocabulary continuous speech recognition using WFST , 2012, SAPA@INTERSPEECH.

[19]  Hagen Soltau,et al.  Morpheme-based feature-rich language models using Deep Neural Networks for LVCSR of Egyptian Arabic , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Murat Saraclar,et al.  Morpholexical and Discriminative Language Models for Turkish Automatic Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Ebru Arisoy,et al.  Lattice Extension and Vocabulary Adaptation for Turkish LVCSR , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Jonathan G. Fiscus,et al.  Results of the 2006 Spoken Term Detection Evaluation , 2006 .

[23]  Hermann Ney,et al.  Using morpheme and syllable based sub-words for polish LVCSR , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Ebru Arisoy,et al.  Syntactic and sub-lexical features for Turkish discriminative language models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Hermann Ney,et al.  Open vocabulary speech recognition with flat hybrid models , 2005, INTERSPEECH.