Gender Gates for Telephone-Based Automatic Speaker Recognition

Abstract The present work demonstrates a need for enhancing text-independent, telephone based, automatic speaker recognition systems with a gender gate. A range of gender gates and speech parameter types are proposed for this problem. These gates and parameters are also investigated in the context of speech degraded by coding and reverberation. It is found that the performance of the most accurate gender gates and speech parameters is similar for uncoded, coded, and reverberated speech. However, the most accurate gender gates and speech parameter types differ slightly across the three scenarios. The most robust all-round gender gates consist of two Mahalanobis distance classifiers with fused outputs or pitch fused to the output of one such classifier. The best all-round speech parameters were reflection and Mel-based cepstrum coefficients.

[1]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[2]  David R. Cole,et al.  Speaker recognition in reverberant enclosures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3]  Younès Bennani,et al.  On the use of TDNN-extracted features information in talker identification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[4]  J.M. Naik,et al.  Speaker verification: a tutorial , 1990, IEEE Communications Magazine.

[5]  Man-Wai Mak,et al.  Comparing multi-layer perceptrons and radial basis functions networks in speaker recognition , 1993 .

[6]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[7]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[8]  Duane DeSieno,et al.  Adding a conscience to competitive learning , 1988, IEEE 1988 International Conference on Neural Networks.

[9]  Ed F. Deprettere,et al.  A class of analysis-by-synthesis predictive coders for high quality speech coding at rates between 4.8 and 16 kbit/s , 1988, IEEE J. Sel. Areas Commun..

[10]  Michael J. Carey,et al.  Language independent gender identification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[11]  Stephen A. Zahorian,et al.  Text-independent talker identification with neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[12]  A K Nábĕlek,et al.  Vowel boundaries for steady-state and linear formant trajectories. , 1993, The Journal of the Acoustical Society of America.

[13]  P. Castellano A study of LVQ learning schedules for ANN speaker identification , 1994, Proceedings of TENCON'94 - 1994 IEEE Region 10's 9th Annual International Conference on: 'Frontiers of Computer Technology'.

[14]  Douglas D. O'Shaughnessy,et al.  Speech communication : human and machine , 1987 .

[15]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[16]  K.L. Brown,et al.  Text-independent speaker identification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[17]  Aaron E. Rosenberg,et al.  A comparative performance study of several pitch detection algorithms , 1976 .

[18]  Sridha Sridharan,et al.  Effects of speech coding on speaker verification , 1996 .