Perceptually Weighted Mel-Cepstrum Analysis of Speech Based on Psychoacoustic Model

This letter proposes a novel approach for mel-cepstral analysis based on the psychoacoustic model of MPEG. A perceptual weighting function is developed by applying cubic spline interpolation on the signal-to-mask ratios (SMRs) which are obtained from the psychoacoustic model. Experiments on speaker identification and speech re-synthesis showed that the proposed method not only improved the speaker recognition performance, but also improved the speech quality of the re-synthesized speech.

[1]  Wonho Yang,et al.  A modified bark spectral distortion measure which uses noise masking threshold , 1997, 1997 IEEE Workshop on Speech Coding for Telecommunications Proceedings. Back to Basics: Attacking Fundamental Problems in Speech Coding.

[2]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[3]  Georg Heinig,et al.  New Fast Algorithms for Toeplitz-Plus-Hankel Matrices , 2003, SIAM J. Matrix Anal. Appl..

[4]  Keiichi Tokuda,et al.  Adaptive cepstral analysis of speech , 1995, IEEE Trans. Speech Audio Process..

[5]  S. Imai,et al.  Mel Log Spectrum Approximation (MLSA) filter for speech synthesis , 1983 .

[6]  Takao Kobayashi,et al.  Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis , 2005, IEICE Trans. Inf. Syst..