Auditory model representation and comparison for speaker recognition

The TIMIT and KING databases are used to compare proven spectral processing techinques to an auditory neural representation for speaker identification. The feature sets compared are linear prediction coding (LPC) cepstral coefficients and auditory nerve firing rates using the Payton model (1988). Two clustering algorithms, one statistically based and the other a neural approach, are used to generate speaker-specific codebook vectors. These algorithms are the Linde-Buzo-Gray algorithm and a Kohonen self-organizing feature map. The resulting vector-quantized distortion-based classification indicates the auditory model performs statistically equal to the LPC cepstral representation in clean environments and outperforms the LPC cepstral in noisy environments and in test data recorded over multiple sessions (greater intra-speaker distortions).<<ETX>>

[1]  Stephanie Seneff A joint synchrony/mean-rate model of auditory speech processing , 1990 .

[2]  Oded Ghitza,et al.  Auditory neural feedback as a basis for speech processing , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[3]  Hiroaki Hattori,et al.  Text-independent speaker recognition using neural networks , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Frank K. Soong,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[5]  Steven K. Rogers,et al.  Auditory model representation for speaker recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Timothy R. Anderson,et al.  A comparison of auditory models for speaker independent phoneme recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Biing-Hwang Juang,et al.  A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Richard F. Lyon,et al.  An analog electronic cochlea , 1988, IEEE Trans. Acoust. Speech Signal Process..

[9]  K. Payton Vowel processing by a model of the auditory periphery: A comparison to eighth‐nerve responses , 1988 .

[10]  M. Sachs,et al.  Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditory-nerve fibers. , 1979, The Journal of the Acoustical Society of America.

[11]  M. Hunt,et al.  Speaker dependent and independent speech recognition experiments with an auditory model , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.