Robustness of bit-stream based features for speaker verification

The paper presents a speaker verification system that uses the YOHO database which has been coded to the ITU-T G.729 standard. A set of bitstream based features, consisting of 16 LPC cepstral coefficients and MFCC derived from the quantized line spectral pairs as well as residual information in the form of pitch, was utilized to construct the speakers' models, and their robustness was studied under white noise conditions. Results suggest that, using a cohort model, MFCC are more robust under noise conditions than LPC cepstral coefficients; the addition of pitch to the feature vector contributes from a 16% to a 29% of improvement in verification performance under different noise conditions.

[1]  Douglas A. Reynolds,et al.  Speaker and language recognition using speech codec parameters , 1999, EUROSPEECH.

[2]  Hong Kook Kim,et al.  A bitstream-based front-end for wireless speech recognition on IS-136 communications system , 2001, IEEE Trans. Speech Audio Process..

[3]  Joseph P. Campbell,et al.  Testing with the YOHO CD-ROM voice verification corpus , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4]  K. Paliwal,et al.  Efficient vector quantization of LPC parameters at 24 bits/frame , 1990 .

[5]  Douglas A. Reynolds,et al.  A Gaussian mixture modeling approach to text-independent speaker identification , 1992 .

[6]  Ke Chen,et al.  Towards better making a decision in speaker verification , 2003, Pattern Recognit..

[7]  Chin-Hui Lee,et al.  Background model design for flexible and portable speaker verification systems , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  Douglas A. Reynolds,et al.  An overview of automatic speaker recognition technology , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.