论文信息 - A pitch synchronous feature extraction method for speaker recognition

A pitch synchronous feature extraction method for speaker recognition

The paper presents a novel feature extraction method to improve the performance of speaker identification systems. The proposed feature has the form of a typical conventional feature, Mel frequency cepstral coefficients (MFCC), but a flexible segmentation to reduce spectral mismatch between training and testing processes. Specifically, the length and shift size of the analysis frame are determined by a pitch synchronous method, pitch synchronous MFCC (PSMFCC). To verify the performance of the new feature, we measure the cepstral distortion between training and testing and also perform closed set speaker identification tests. With text-independent and text-dependent experiments, the proposed algorithm provides 44.3% and 26.7% relative improvement, respectively.

[1] Joseph P. Campbell. Testing with the YOHO CD-ROM voice verification corpus , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2] Douglas A. Reynolds,et al. Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[3] K. Ozawa,et al. 2.4 kbps pitch prediction multi-pulse speech coding , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[4] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[5] Douglas A. Reynolds,et al. Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[6] Jirí Navrátil,et al. Depitch and the role of fundamental frequency in speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[7] John B. Shoven,et al. I , Edinburgh Medical and Surgical Journal.

[8] Kiyoaki Aikawa,et al. Noise-robust speech recognition using a new spectral estimation method “PHASOR” , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.