COMPACT SPEECH FEATURES BASED ON WAVELET TRANSFORM AND PCA WITH APPLICATION TO SPEAKER IDENTIFICATION

The main goal of this paper is to find some effective methods to improve the performance of speaker identification system. In speaker identification, we use wavelet transform to decompose the speech signals into several frequency bands and then use cepstral coefficients to capture the individualities of vocal track within the interested bands based on the acoustic characteristic of human ear. In addition, an adaptive wavelet-based filtering mechanism is applied to eliminate the small variation of wavelet coefficients caused by noise. In order to effectively utilize all these multi-band speech features, we propose a modified vector quantization method called multi-layer eigen-codebook vector quantization (MLECVQ) as the identifier. This model uses the multi-layer concept to eliminate the interference between the multi-band coefficients and then uses the principal component analysis (PCA) method to evaluate the codebooks for capturing more details of phoneme character. Experimental results show that the proposed method is better than the GMM+MFCC model on computational cost and recognition performance under clean and noisy speech data evaluations.

[1]  Xerox Corpora,et al.  Speech Recognition Experiments with Linear Predication, Bandpass Filtering, and Dynamic Programming , 1975 .

[2]  Kaliappan Gopalan,et al.  A comparison of speaker identification results using features based on cepstrum and Fourier-Bessel expansion , 1999, IEEE Trans. Speech Audio Process..

[3]  Chiyomi Miyajima,et al.  Speaker identification using Gaussian mixture models based on multi-space probability distribution , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[5]  Günther Palm,et al.  A discriminative training algorithm for VQ-based speaker identification , 1999, IEEE Trans. Speech Audio Process..

[6]  Parcor Coeff,et al.  Comparison of Speaker Recognition Methods Using Statistical Features and Dynamic Features , 1981 .

[7]  Francisco Javier Caminero Gil,et al.  Discriminative training of GMM for speaker identification , 1996, ICASSP.

[8]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[9]  J.H.L. Hansen,et al.  An efficient scoring algorithm for Gaussian mixture model based speaker identification , 1998, IEEE Signal Processing Letters.

[10]  J. Buck,et al.  Text-dependent speaker recognition using vector quantization , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  G. White,et al.  Speech recognition experiments with linear predication, bandpass filtering, and dynamic programming , 1976 .

[12]  S. Furui,et al.  Vector-quantization-based speech recognition and speaker recognition techniques , 1991, [1991] Conference Record of the Twenty-Fifth Asilomar Conference on Signals, Systems & Computers.

[13]  Hsiao-Chuan Wang,et al.  MAT - A Project to Collect Mandarin Speech Data Through Telephone Net works in Taiwan , 1997, Int. J. Comput. Linguistics Chin. Lang. Process..

[14]  Naftali Z. Tisby On the application of mixture AR hidden Markov models to text independent speaker recognition , 1991, IEEE Trans. Signal Process..

[15]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[16]  Douglas D. O'Shaughnessy,et al.  Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition , 1999, IEEE Trans. Speech Audio Process..

[17]  Douglas A. Reynolds,et al.  Experimental evaluation of features for robust speaker identification , 1994, IEEE Trans. Speech Audio Process..

[18]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[19]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[20]  E. Micheli-Tzanakou,et al.  Speaker identification using neural networks and wavelets , 2000, IEEE Engineering in Medicine and Biology Magazine.

[21]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[24]  Biing-Hwang Juang,et al.  A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.