Towards ASR Based on Hierarchical Posterior-Based Keyword Recognition
暂无分享,去创建一个
The paper presents an alternative approach to automatic recognition of speech in which each targeted word is classified by a separate binary classifier against all other sounds. No time alignment is done. To build a recognizer for N words, N parallel binary classifiers are applied. The system first estimates uniformly sampled posterior probabilities of phoneme classes, followed by a second step in which a rather long sliding time window is applied to the phoneme posterior estimates and its content is classified by an artificial neural network to yield posterior probability of the keyword. On a small vocabulary ASR task, the system still does not reach the performance of the state-of-the-art system but its conceptual simplicity, the ease of adding new target words, and its inherent resistance to out-of-vocabulary sounds may prove significant advantage in many applications
[1] Hynek Hermansky,et al. Multi-resolution RASTA filtering for TANDEM-based ASR , 2005, INTERSPEECH.
[2] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.
[3] Ronald A. Cole,et al. New telephone speech corpora at CSLU , 1995, EUROSPEECH.
[4] R. Cole,et al. TELEPHONE SPEECH CORPUS DEVELOPMENT AT CSLU , 1998 .