Towards robust phoneme classification: Augmentation of PLP models with acoustic waveforms

The robustness of classification of phoneme segments using generative classifiers is investigated for the PLP and acoustic waveform speech representations in the presence of white Gaussian noise. We combine the strengths of both representations, specifically the excellent classification accuracy of PLP in quiet conditions with the additional robustness of acoustic waveform classifiers. This is achieved using a convex combination of their respective log-likelihoods to produce a combined decision function. The resulting combined classifier is uniformly as accurate as PLP alone and is significantly more robust to the presence of additive noise during testing. Issues of noise modelling and time-invariant classification of acoustic waveforms are also considered with initial solutions used to improve accuracy.

[1]  Michael E. Tipping,et al.  Mixtures of Principal Component Analysers , 1997 .

[2]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[3]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[4]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[5]  Zheng Fang,et al.  Comparison of different implementations of MFCC , 2001 .

[6]  Thomas Fang Zheng,et al.  Comparison of different implementations of MFCC , 2001, Journal of Computer Science and Technology.

[7]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[8]  T. Shimamura,et al.  Noise estimation using high frequency regions for speech enhancement in low SNR environments , 2002, Speech Coding, 2002, IEEE Workshop Proceedings..

[9]  Richard M. Stern,et al.  Environmental robustness in automatic speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  G. A. Miller,et al.  An Analysis of Perceptual Confusions Among Some English Consonants , 1955 .

[11]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.