Combining Speech Enhancement and Discriminative Feature Extraction for Robust Speaker Recognition

It is well known that discriminative feature and effective robust processing are two key techniques. This paper presents a new strategy which combining speech enhancement and discriminative feature in order to overcome the acoustics mismatch between training and testing data in the noise environment. On the one hand, a comparison results in two noise environments indicate that the recognition rates based on DFCC are averagely higher 6.11% (White noise) and 8%(Factory noise) respectively than MFCC, which confirmed that the effectiveness of discriminative and robustness of DFCC. On the other hand, when combining speech enhancement and discriminative feature, the improvement based on SMFCC is limited, only 0.93%, 1.87%, while the performance has been improved by 2.54%, 2.31% based on SDFCC.

[1]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[2]  K Honda,et al.  Acoustic characteristics of the piriform fossa in models and humans. , 1997, The Journal of the Acoustical Society of America.

[3]  Jianwu Dang,et al.  An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification , 2008, Speech Commun..

[4]  Keiichi Tokuda,et al.  A new approach to designing a feature extractor in speaker identification based on discriminative feature extraction , 2001, Speech Commun..

[5]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[6]  M. K. Hasan,et al.  A modified a priori SNR for speech enhancement using spectral subtraction rules , 2004, IEEE Signal Processing Letters.

[7]  Joseph Sylvester Chang,et al.  A parametric formulation of the generalized spectral subtraction method , 1998, IEEE Trans. Speech Audio Process..

[8]  Wai Nang Chan,et al.  Discrimination Power of Vocal Source and Vocal Tract Related Features for Speaker Segmentation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[10]  Fumitada Itakura,et al.  Text-dependent speaker recognition using the information in the higher frequency band , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..