Thai Speech Keyword Spotting using Heterogeneous Acoustic Modeling

This paper illustrates the use of acoustic modeling of three different structures, including syllables, fillers and keywords, for keyword spotting. Filler models and syllable models are applied to capture out-of-vocabulary words, while keyword models extract significant words from speech utterances. Grammatical details are utilized with syllable models to add extra domain constraints. This improves the system's ability to detect non-keyword vocabularies. Filler models associating with syllable models reduce false alarm of keyword detection. Three kinds of filler models are described. Different types of filler models perform differently in keyword spotting of utterances with only one keyword and ones with multiple keywords. Experiments are conducted on a telephone call transferring via Thai spoken language domain. The proposed method is compared with a limited vocabulary speech recognition and keyword spotting using a reward function. For single- keyword utterances, the best accuracy obtained using the proposed method is approximately 70%, which is better than the ones from LVSR and spotting via reward functions. For multiple-keyword utterances, the best precision and recall rates are 72% and 65%, respectively. These are marginally better than ones obtained from limited vocabulary speech recognition, while typical reward function approach yields the rates of less than 50%.

[1]  Etienne Barnard,et al.  Appropriate baseline values for HMM-based speech recognition , 2004 .

[2]  James Glass,et al.  Modelling out-of-vocabulary words for robust speech recognition , 2002 .

[3]  Lukás Burget,et al.  Comparison of keyword spotting approaches for informal continuous speech , 2005, INTERSPEECH.

[4]  Vassilios Digalakis,et al.  Fast speaker adaptation of large vocabulary continuous density HMM speech recognizer using a basis transform approach , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Ümit Yapanel GARBAGE MODELING TECHNIQUES FOR A TURKISH KEYWORD SPOTTING SYSTEM , 2000 .

[6]  Carsten Meyer Utterance-level boosting of HMM speech recognizers , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Gérard Chollet,et al.  A new keyword spotting approach based on reward function , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[8]  A. Suchato,et al.  Phoneme Classification Study for Thai Segment-Based Acoustic Models , 2006, 2006 International Symposium on Communications and Information Technologies.

[9]  Steve Young,et al.  The HTK book , 1995 .

[10]  Bertram E. Shi,et al.  A one-pass strategy for keyword spotting and verification , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).