论文信息 - A Novel Discriminative Score Calibration Method for Keyword Search

A Novel Discriminative Score Calibration Method for Keyword Search

The performance of keyword search systems depends heavily on the quality of confidence scores. In this work, a novel discriminative score calibration method has been proposed. By training an MLP classifier employing the word posterior probability and several novel normalized scores, we can obtain a relative improvement of 4.67% for the actual term-weighted value (ATWV) metric on the OpenKWS15 development test dataset. In addition, a LSTM-CTC based keyword verification method has been proposed to supply extra acoustic information. After the information is added, a further improvement of 7.05% over the baseline can be observed.

Meng Cai | Zhiqiang Lv | Jia Liu | Wei-Qiang Zhang

[1] Yifan Gong,et al. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2] Kai Feng,et al. The subspace Gaussian mixture model - A structured model for speech recognition , 2011, Comput. Speech Lang..

[3] Mari Ostendorf,et al. Compensating for Word Posterior Estimation Bias in Confusion Networks , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4] Andrew Rosenberg,et al. Using word burst analysis to rescore keyword search candidates on low-resource languages , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5] Mark J. F. Gales,et al. Investigation of multilingual deep neural networks for spoken term detection , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[6] Andreas Stolcke,et al. Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[7] Xiaodong Cui,et al. System combination and score normalization for spoken term detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8] Yu Zhang,et al. Graph-based re-ranking using acoustic feature similarity between search results for spoken term detection on low-resource languages , 2014, INTERSPEECH.

[9] Julia Hirschberg,et al. Rescoring Confusion Networks for Keyword Search , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10] Meng Cai,et al. Calibration of word posterior estimation in confusion networks for keyword search , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[11] Navdeep Jaitly,et al. Vocal Tract Length Perturbation (VTLP) improves speech recognition , 2013 .

[12] Lin-Shan Lee,et al. Improved spoken term detection with graph-based re-ranking in feature space , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[14] Dong Wang,et al. Augmented set of features for confidence estimation in spoken term detection , 2010, INTERSPEECH.

[15] Haizhou Li,et al. Discriminative score normalization for keyword search decision , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16] Richard M. Schwartz,et al. Score normalization and system combination for improved keyword spotting , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[17] Richard M. Schwartz,et al. White Listing and Score Normalization for Keyword Spotting of Noisy Speech , 2012, INTERSPEECH.

[18] Lin-Shan Lee,et al. Enhanced Spoken Term Detection Using Support Vector Machines and Weighted Pseudo Examples , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[19] Meng Cai,et al. High-performance Swahili keyword search with very limited language pack: The THUEE system for the OpenKWS15 evaluation , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[20] Meng Cai,et al. Convolutional maxout neural networks for low-resource speech recognition , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[21] Herbert Gish,et al. Rapid and accurate spoken term detection , 2007, INTERSPEECH.

[22] Gunnar Evermann,et al. Large vocabulary decoding and confidence estimation using word posterior probabilities , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[23] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[24] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[25] Tetsuya Takiguchi,et al. Two-step correction of speech recognition errors based on n-gram and long contextual information , 2013, INTERSPEECH.