Custom-designed SVM kernels for improved robustness of phoneme classification

The robustness of phoneme classification to white Gaussian noise and pink noise in the acoustic waveform domain is investigated using support vector machines. We focus on the problem of designing kernels which are tuned to the physical properties of speech. For comparison, results are reported for the PLP representation of speech using standard kernels. We show that major improvements can be achieved by incorporating the properties of speech into kernels. Furthermore, the high-dimensional acoustic waveforms exhibit more robust behavior to additive noise. Finally, we investigate a combination of the PLP and acoustic waveform representations which attains better classification than either of the individual representations over a range of noise levels.

[1]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[2]  Bin Yu,et al.  Combined PLP - Acoustic waveform classification for robust phoneme recognition using support vector machines , 2008, 2008 16th European Signal Processing Conference.

[3]  G. A. Miller,et al.  An Analysis of Perceptual Confusions Among Some English Consonants , 1955 .

[4]  Peter Sollich,et al.  Robust phoneme classification: Exploiting the adaptability of acoustic waveform models , 2009, 2009 17th European Signal Processing Conference.

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Olli Viikki,et al.  Cepstral domain segmental feature vector normalization for noise robust speech recognition , 1998, Speech Commun..

[7]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[8]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[9]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[10]  Li Deng,et al.  Evaluation of the SPLICE algorithm on the Aurora2 database , 2001, INTERSPEECH.

[11]  Louis D. Braida,et al.  Human and machine consonant recognition , 2005, Speech Commun..

[12]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[13]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[14]  Werner Hemmert,et al.  Automatic speech recognition with an adaptation model motivated by auditory processing , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[16]  Pedro J. Moreno,et al.  On the use of support vector machines for phonetic classification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).