论文信息 - Perceptually Weighted Mel-Cepstrum Analysis of Speech Based on Psychoacoustic Model

Perceptually Weighted Mel-Cepstrum Analysis of Speech Based on Psychoacoustic Model

This letter proposes a novel approach for mel-cepstral analysis based on the psychoacoustic model of MPEG. A perceptual weighting function is developed by applying cubic spline interpolation on the signal-to-mask ratios (SMRs) which are obtained from the psychoacoustic model. Experiments on speaker identification and speech re-synthesis showed that the proposed method not only improved the speaker recognition performance, but also improved the speech quality of the re-synthesized speech.

[1] Wonho Yang,et al. A modified bark spectral distortion measure which uses noise masking threshold , 1997, 1997 IEEE Workshop on Speech Coding for Telecommunications Proceedings. Back to Basics: Attacking Fundamental Problems in Speech Coding.

[2] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[3] Georg Heinig,et al. New Fast Algorithms for Toeplitz-Plus-Hankel Matrices , 2003, SIAM J. Matrix Anal. Appl..

[4] Keiichi Tokuda,et al. Adaptive cepstral analysis of speech , 1995, IEEE Trans. Speech Audio Process..

[5] S. Imai,et al. Mel Log Spectrum Approximation (MLSA) filter for speech synthesis , 1983 .

[6] Takao Kobayashi,et al. Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis , 2005, IEICE Trans. Inf. Syst..