Combining spectral features of standard and Throat Microphones for speaker identification

The objective of this paper is to improve the performance of the speaker recognition system by combining speaker specific evidences present in the spectral characteristics of the standard microphone speech and the throat microphone speech. Certain vocal tract spectral features extracted from these two speech signals are distinct and could be complimentary to one another. These features could also be speech specific as well as speaker specific. These distinguishing and complimentary nature of the spectral features are due to the difference in the placement of the two microphones. Auto associative neural networks are used to model the speaker characteristics based on the system features represented by weighted linear prediction cepstral coefficients. The speaker recognition system based on Throat Microphone (TM) spectral features is comparable (though slightly less accurate) to that based on standard (or Normal) Microphone (NM) features. By combining the evidence from both the NM and TM based systems using late integration, an improvement in performance is observed from about 91% (obtained using NM features alone) to 94% (NM and TM combined). This shows the potential of combining various other speaker specific characteristics of the NM and two speech signals for further improvement in performance.

[1]  A. Roli Artificial Neural Networks , 2012, Lecture Notes in Computer Science.

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[4]  F. J.,et al.  Speaker Verification Using Combined Acoustic and Em Sensor Signal Processing , .

[5]  Bayya Yegnanarayana,et al.  Throat microphone signal for speaker recognition , 2004, INTERSPEECH.

[6]  L. C. Ng,et al.  SPEAKER VERIFICATION USING COMBINED ACOUSTIC AND EM SENSOR SIGNAL PROCESSING 1 , 2001 .

[7]  Bayya Yegnanarayana,et al.  Speaker-specific mapping for text-independent speaker recognition , 2003, Speech Commun..

[8]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[9]  Kishore Prahallad,et al.  AANN: an alternative to GMM for pattern recognition , 2002, Neural Networks.

[10]  H. Franco,et al.  Combining standard and throat microphones for robust speech recognition , 2003, IEEE Signal Processing Letters.

[11]  B. Lindblom,et al.  Acoustical consequences of lip, tongue, jaw, and larynx movement. , 1970, The Journal of the Acoustical Society of America.

[12]  William M. Campbell,et al.  Multimodal Speaker Authentication using Nonacoustic Sensors , 2003 .

[13]  Tanja Schultz,et al.  Adaptation for soft whisper recognition using a throat microphone , 2004, INTERSPEECH.

[14]  Kishore Prahallad,et al.  Source and system features for speaker recognition using AANN models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).