An approach for spoken term detection based on modified Gaussian posteriorgrams

Query-by-Example Spoken Term Detection(QbE-STD) has been a hot research topic in speech recognition field. While template representation is the key composition part of QbE-STD, many researchers have been committed to developing effective template representations to obtain the better performance. Gaussian posteriorgram has been widely used due to that the GMM model which generates the Gaussian posteriorgram can be convenient and easy to train. However, the corresponding performance is not that satisfactory. In this paper, we use modified Gaussian posteriorgram based on the proposed Gaussian components selection algorithm as template representation, which emphasizes the discriminant among queries. The selection algorithm is inspired by the TF-IDF concept well known to the information retrieval and text indexing fields. We carried out comparison on the TIMIT corpus, and the results showed that, with our approach, the P@N was increased by 12%, and the EER was reduced by 10%.

[1]  Pabitra Mitra,et al.  Recent developments in spoken term detection: a survey , 2014, Int. J. Speech Technol..

[2]  Timothy J. Hazen,et al.  Query-by-example spoken term detection using phonetic posteriorgram templates , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[3]  Li-Ping Jing,et al.  Improved feature selection approach TFIDF in text mining , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.

[4]  Richard Sproat,et al.  Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.

[5]  Mitchel Weintraub,et al.  LVCSR log-likelihood ratio scoring for keyword spotting , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6]  James R. Glass,et al.  Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[7]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[8]  X. Anguera Speaker independent discriminant feature extraction for acoustic pattern-matching , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Xavier Anguera Miró,et al.  Memory efficient subsequence DTW for Query-by-Example Spoken Term Detection , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[10]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .