Neural network models for extracting complementary speaker-specific information from residual phase

In this paper using neural network models we demonstrate the presence of complementary speaker-specific information in the residual phase as compared to the conventional spectral features. The spectral features mainly represent the speaker-specific vocal tract system features. The proposed LP residual phase represents the speaker-specific excitation source information. Speaker recognition studies are conducted using NIST 2003 speaker recognition evaluation database. The speaker recognition system using only spectral features gives an equal error rate (EER) of 15.5% and using only LP residual phase information gives an EER of 22.0%. However, combining the evidences from LP residual phase and spectral features increases the performance to an EER of 13.5%. This result clearly demonstrates the complementary nature of speaker-specific information present in the LP residual phase.

[1]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[2]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[3]  Hynek Hermansky,et al.  Data-Driven Temporal Filters and Alternatives to GMM in Speaker Verification , 2000, Digit. Signal Process..

[4]  D. O'Shaughnessy,et al.  Speaker recognition , 1986, IEEE ASSP Magazine.

[5]  B. Yegnanarayana,et al.  Autoassociative neural network models for online speaker verification using source features from vowels , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).