Discriminative Feature Extraction Based on Self-Adaptive Frequency Warping for Robust Speaker Identification

This paper presents a new discriminative feature based on self-adaptive frequency warping. We analyze the discrimination power between frequency components and individual characteristics and quantify this dependency. This new feature is extracted by nonuniform sub-band filters designed according to self-adaptive frequency warping in different frequency bands. Furthermore, in order to overcoming the acoustics mismatch between training and testing data in the noise environment, we adopted pre-enhancement prior to feature extracted module. Using a series of controlled experiments, it is shown that the theory of this feature is reasonable and understandable, which is insensitive to spoken content and thus more discriminative and robust in comparison to the conventional Mel frequency cepstral coefficients. The experimental results demonstrate that combining pre-enhancement and discriminative feature leads to noticeable improvement on speaker recognition rate and robustness.

[1]  Wai Nang Chan,et al.  Discrimination Power of Vocal Source and Vocal Tract Related Features for Speaker Segmentation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Kiyoshi Honda,et al.  Individual variation of the hypopharyngeal cavities and its acoustic effects , 2005 .

[3]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[4]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[5]  Fumitada Itakura,et al.  Text-dependent speaker recognition using the information in the higher frequency band , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[7]  Keiichi Tokuda,et al.  A new approach to designing a feature extractor in speaker identification based on discriminative feature extraction , 2001, Speech Commun..

[8]  K Honda,et al.  Acoustic characteristics of the piriform fossa in models and humans. , 1997, The Journal of the Acoustical Society of America.

[9]  M. K. Hasan,et al.  A modified a priori SNR for speech enhancement using spectral subtraction rules , 2004, IEEE Signal Processing Letters.

[10]  Joseph Sylvester Chang,et al.  A parametric formulation of the generalized spectral subtraction method , 1998, IEEE Trans. Speech Audio Process..

[11]  Jianwu Dang,et al.  An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification , 2008, Speech Commun..