Speaker recognition using neural networks and conventional classifiers

An evaluation of various classifiers for text-independent speaker recognition is presented. In addition, a new classifier is examined for this application. The new classifier is called the modified neural tree network (MNTN). The MNTN is a hierarchical classifier that combines the properties of decision trees and feedforward neural networks. The MNTN differs from the standard NTN in both the new learning rule used and the pruning criteria. The MNTN is evaluated for several speaker recognition experiments. These include closed- and open-set speaker identification and speaker verification. The database used is a subset of the TIMIT database consisting of 38 speakers from the same dialect region. The MNTN is compared with nearest neighbor classifiers, full-search, and tree-structured vector quantization (VQ) classifiers, multilayer perceptrons (MLPs), and decision trees. For closed-set speaker identification experiments, the full-search VQ classifier and MNTN demonstrate comparable performance. Both methods perform significantly better than the other classifiers for this task. The MNTN and full-search VQ classifiers are also compared for several speaker verification and open-set speaker-identification experiments. The MNTN is found to perform better than full-search VQ classifiers for both of these applications. In addition to matching or exceeding the performance of the VQ classifier for these applications, the MNTN also provides a logarithmic saving for retrieval. >

[1]  Douglas A. Reynolds,et al.  Text independent speaker identification using automatic acoustic segmentation , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  Frank K. Soong,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[3]  G.R. Doddington,et al.  Speaker recognition—Identifying people by their voices , 1985, Proceedings of the IEEE.

[4]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[5]  G. Velius,et al.  Variants of cepstrum based speaker identity verification , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[6]  Ren-Hua Wang,et al.  A weighted distance measure based on the fine structure of feature space: application to speaker recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[7]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[8]  Stephen A. Zahorian,et al.  Text-independent talker identification with neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[9]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[10]  Robert J. Marks,et al.  A performance comparison of trained multilayer perceptrons and trained classification trees , 1989, Conference Proceedings., IEEE International Conference on Systems, Man and Cybernetics.

[11]  B.S. Atal,et al.  Automatic recognition of speakers from their voices , 1976, Proceedings of the IEEE.

[12]  Aaron E. Rosenberg,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  M. Savic,et al.  A TMs32020-based real time, text-independent, automatic speaker verification system , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[14]  R. Lummis,et al.  Speaker verification by computer using speech intensity for temporal registration , 1973 .

[15]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[16]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[17]  D. O'Shaughnessy,et al.  Speaker recognition , 1986, IEEE ASSP Magazine.

[18]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[19]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[20]  J. Oglesby,et al.  Radial basis function networks for speaker recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[21]  Younès Bennani,et al.  On the use of TDNN-extracted features information in talker identification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[22]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[23]  Naftali Z. Tisby On the application of mixture AR hidden Markov models to text independent speaker recognition , 1991, IEEE Trans. Signal Process..

[24]  P. Gallinari,et al.  A connectionist approach for automatic speaker identification , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[25]  J. Wolf Efficient Acoustic Parameters for Speaker Recognition , 1972 .

[26]  M. V. Mathews,et al.  Statistical techniques for talker identification , 1971 .

[27]  C.-S. Liu,et al.  Study of line spectrum pair frequencies for speaker recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[28]  M. Sambur Speaker Recognition and Verification using Linear Prediction Analysis , 1973 .

[29]  A.E. Rosenberg,et al.  Automatic speaker verification: A review , 1976, Proceedings of the IEEE.

[30]  M. Savic,et al.  Variable parameter speaker verification system based on hidden Markov modeling , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[31]  Aaron E. Rosenberg,et al.  Connected word talker verification using whole word hidden Markov models , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[32]  Rao Yarlagadda,et al.  Features and measures for speaker recognition , 1992 .

[33]  A. Oppenheim,et al.  Homomorphic analysis of speech , 1968 .

[34]  G. Doddington A Method or Speaker Verification , 1971 .

[35]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[36]  S. Pruzansky Pattern‐Matching Procedure for Automatic Talker Recognition , 1963 .

[37]  B. Atal Automatic Speaker Recognition Based on Pitch Contours , 1969 .

[38]  Biing-Hwang Juang,et al.  A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[39]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[40]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[41]  K. P. Li,et al.  An approach to text-independent speaker recognition with short utterances , 1983, ICASSP.

[42]  Richard Mammone,et al.  On-line training algorithm to overcome catastrophic forgetting , 1992 .

[43]  Lawrence G. Bahler,et al.  Voice identification using nearest-neighbor distance measure , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[44]  Richard J. Mammone,et al.  Growing and Pruning Neural Tree Networks , 1993, IEEE Trans. Computers.

[45]  M. Sambur,et al.  Selection of acoustic features for speaker identification , 1975 .

[46]  Aaron E. Rosenberg,et al.  Evaluation of a vector quantization talker recognition system in text independent and text dependent modes , 1987 .

[47]  Wray L. Buntine,et al.  Learning classification trees , 1992 .

[48]  A. L. Higgins,et al.  Text-independent speaker verification by discriminator counting , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.