Investigation of Spectral Centroid Magnitude and Frequency for Speaker Recognition

Most conventional features used in speaker recognition are based on spectral envelope characterizations such as Mel-scale filterbank cepstrum coefficients (MFCC), Linear Prediction Cepstrum Coefficient (LPCC) and Perceptual Linear Prediction (PLP). The MFCC’s success has seen it become a de facto standard feature for speaker recognition. Alternative features, that convey information other than the average subband energy, have been proposed, such as frequency modulation (FM) and subband spectral centroid features. In this study, we investigate the characterization of subband energy as a two dimensional feature, comprising Spectral Centroid Magnitude (SCM) and Spectral Centroid Frequency (SCF). Empirical experiments carried out on the NIST 2001 and NIST 2006 databases using SCF, SCM and their fusion suggests that the combination of SCM and SCF are somewhat more accurate compared with conventional MFCC, and that both fuse effectively with MFCCs. We also show that frame-averaged FM features are essentially centroid features, and provide an SCF implementation that improves on the speaker recognition performance of both subband spectral centroid and FM features.

[1]  Bayya Yegnanarayana,et al.  Speech processing using group delay functions , 1991, Signal Process..

[2]  B. Yegnanarayana,et al.  Processing of noisy speech using modified group delay functions , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Qi Li,et al.  Recognition of noisy speech using dynamic spectral subband centroids , 2004, IEEE Signal Processing Letters.

[4]  Tomi Kinnunen,et al.  Speaker Verification with Adaptive Spectral Subband Centroids , 2007, ICB.

[5]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[6]  Eliathamby Ambikairajah,et al.  Analysis of band structures for speaker-specific information in FM feature extraction , 2009, INTERSPEECH.

[7]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[8]  Danoush Hosseinzadeh,et al.  On the Use of Complementary Spectral Features for Speaker Recognition , 2008, EURASIP J. Adv. Signal Process..

[9]  Kuldip K. Paliwal,et al.  Robust speech recognition in noisy environments based on subband spectral centroid histograms , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Eliathamby Ambikairajah,et al.  Computationally efficient frame-averaged FM feature extraction for speaker recognition , 2009 .

[11]  Samy Bengio,et al.  Spectral Subband Centroids as Complementary Features for Speaker Authentication , 2004, ICBA.

[12]  Kuldip K. Paliwal Spectral subband centroids as features for speech recognition , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[13]  Bin Ma,et al.  Evaluation of a fused FM and cepstral-based speaker recognition system on the NIST 2008 SRE , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  E. Ambikairajah,et al.  Extraction of FM components from speech signals using all-pole model , 2008 .

[15]  William M. Campbell,et al.  Advances in channel compensation for SVM speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..