论文信息 - Kernel metric learning for phonetic classification

Kernel metric learning for phonetic classification

While a sound spoken is described by a handful of frame-level spectral vectors, not all frames have equal contribution for either human perception or machine classification. In this paper, we introduce a novel framework to automatically emphasize important speech frames relevant to phonetic information. We jointly learn the importance of speech frames by a distance metric across the phone classes, attempting to satisfy a large margin constraint: the distance from a segment to its correct label class should be less than the distance to any other phone class by the largest possible margin. Furthermore, an universal background model structure is proposed to give the correspondence between statistical models of phone types and tokens, allowing us to use statistical models of each phone token in a large margin speech recognition framework. Experiments on TIMIT database demonstrated the effectiveness of our framework.

[1] Andrew K. Halberstadt. Heterogeneous acoustic measurements and multiple classifiers for speech recognition , 1999 .

[2] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[3] S. Borys. AN SVM FRONT-END LANDMARK SPEECH RECOGNITION SYSTEM , 2008 .

[4] Carol Y. Espy-Wilson,et al. Landmark-based approach to speech recognition: an alternative to HMMs , 2007, INTERSPEECH.

[5] Douglas E. Sturim,et al. SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .

[7] Jitendra Malik,et al. Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8] Victor W. Zue,et al. Phonetic classification using multi-layer perceptrons , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[9] S. Furui. On the role of spectral transition for speech perception. , 1986, The Journal of the Acoustical Society of America.

[10] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..