Speaker information from subband energies of Linear Prediction residual

The objective of this work is to demonstrate the significant speaker information present in the subband energies of the Linear Prediction (LP) residual. The LP residual mostly contains the excitation source information. The subband energies extracted using the mel filterbank followed by cepstral analysis provides a compact representation. The resulting cepstral values are termed as Residual-mel Frequency Cepstral Coefficients (R-MFCC). The speaker identification studies conducted using R-MFCC as features and Gaussian mixture model (GMM) on a subset of 30 speakers from NIST-1999 provides 87% accuracy. The performance using MFCC extracted directly from speech provides 87% accuracy. Further, the combination of the two provides 90% accuracy indicating the different aspect of speaker information present in R-MFCC.

[1]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[2]  B. Atal Automatic Speaker Recognition Based on Pitch Contours , 1969 .

[3]  Douglas A. Reynolds,et al.  Modeling of the glottal flow derivative waveform with application to speaker identification , 1999, IEEE Trans. Speech Audio Process..

[4]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[5]  Wang Bingxi,et al.  Text-independent speaker recognition using support vector machine , 2001, 2001 International Conferences on Info-Tech and Info-Net. Proceedings (Cat. No.01EX479).

[6]  Alvin F. Martin,et al.  The NIST 1999 Speaker Recognition Evaluation - An Overview , 2000, Digit. Signal Process..

[7]  Kishore Prahallad,et al.  AANN: an alternative to GMM for pattern recognition , 2002, Neural Networks.

[8]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[9]  S. R. Mahadeva Prasanna,et al.  Extraction of speaker-specific excitation information from linear prediction residual of speech , 2006, Speech Commun..

[10]  S. R. Mahadeva Prasanna,et al.  Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system , 2005, IEEE Transactions on Speech and Audio Processing.

[11]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[12]  Heinz Hügli,et al.  Usefulness of the LPC-residue in text-independent speaker verification , 1995, Speech Commun..

[13]  Bayya Yegnanarayana,et al.  Combining evidence from residual phase and MFCC features for speaker recognition , 2006, IEEE Signal Processing Letters.