Speaker Identification using FM Features

The AM-FM modulation model of speech is a nonlinear model that has been successfully used in several branches of speech-related research. However, the significance of the AM-FM features extracted from this model has not been fully explored in applications such as speaker identification systems. This paper shows that frequency modulation (FM) features can improve speaker identification accuracy. Due to the similarity between amplitude modulation (AM) feature and the conventional Mel frequency cepstrum coefficients (MFCC), this paper mainly focuses on the FM feature. The correlation between FM feature components is shown to be very small compared with that of Mel filterbank log energies, thus reducing the need for decorrelation. FM feature components are shown to be very nearly Gaussian distributed. Further, speech synthesis using AM-FM features is performed to compare four existing AM-FM demodulation methods based on the perceptual quality of the synthesized speech. Of these, Digital Energy Separation Algorithm (DESA) gives the best synthesized speech, and is thus used as a front-end in our speaker identification system. Evaluation of speaker identification using FM features on the NIST 2001 database shows a relative improvement in speaker identification accuracy of 2% for male speakers and 9% for female speakers over the conventional MFCC-based frontend.

[1]  Petros Maragos,et al.  Speech analysis and synthesis using an AM-FM modulation model , 1997, Speech Commun..

[2]  Peter C. Doerschuk,et al.  Statistical AM-FM models, extended Kalman filter demodulation, Cramer-Rao bounds, and speech analysis , 2000, IEEE Trans. Signal Process..

[3]  Fan-Gang Zeng,et al.  Encoding frequency Modulation to improve cochlear implant performance in noise , 2005, IEEE Transactions on Biomedical Engineering.

[4]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[5]  Petros Maragos,et al.  AM-FM energy detection and separation in noise using multiband energy operators , 1993, IEEE Trans. Signal Process..

[6]  Rodger E. Ziemer,et al.  Principles of communications , 1976 .

[7]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[8]  D. Nelson,et al.  Cross-spectral methods for processing speech. , 2001, The Journal of the Acoustical Society of America.

[9]  Thippur V. Sreenivas,et al.  Novel approach to AM-FM decomposition with applications to speech and music analysis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Douglas A. Reynolds,et al.  Formant AM-FM for speaker identification , 1994, Proceedings of IEEE-SP International Symposium on Time- Frequency and Time-Scale Analysis.

[11]  Comprehensive modulation representation for automatic speech recognition , 2005, INTERSPEECH.

[12]  Petros Maragos,et al.  A comparison of the energy operator and the Hilbert transform approach to signal and speech demodulation , 1994, Signal Process..

[13]  Thomas F. Quatieri,et al.  AM-FM separation using auditory-motivated filters , 1997, IEEE Trans. Speech Audio Process..