Neural networks for discrimination and modelization of speakers

Abstract This article reviews current research on neural network systems for speaker recognition tasks. We consider two main approaches, the first one relies on direct classification and the second on speaker modelization. The potential of connectionist models for speaker recognition is first presented and the main models are briefly introduced. We then present different systems which have been recently proposed for speaker recognition tasks. We discuss their respective performances and potentials and compare these techniques to more conventional methods like vector quantization and Hidden Markov models. The paper ends with a summary and suggestions for further developments.

[1]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[2]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[3]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[4]  Esther Levin,et al.  Word recognition using hidden control neural architecture , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[6]  Y. Bennani Probabilistic cooperation of connectionist expect modules: validation on a speaker identification task , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Younès Bennani Speaker identification through a modular connectionist architecture: evaluation on the timit database , 1992, ICSLP.

[8]  Patrick Gallinari,et al.  Neural models for extracting speaker characteristics in speech modelization systems , 1993, EUROSPEECH.

[9]  Hervé Bourlard,et al.  CDNN: a context dependent neural network for continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  P. GALLINARI,et al.  On the relations between discriminant analysis and multilayer perceptrons , 1991, Neural Networks.

[11]  Alex Waibel,et al.  The Meta-Pi network: connectionist rapid adaptation for high-performance multi-speaker phoneme recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[12]  Younès Bennani,et al.  On the use of TDNN-extracted features information in talker identification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[13]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[14]  Hervé Bourlard HOW CONNECTIONIST MODELS COULD IMPROVE MARKOV MODELS FOR SPEECH RECOGNITION , 1990 .

[15]  P. Gallinari,et al.  A speech recognizer optimally combining learning vector quantization, dynamic programming and multi-layer perceptron , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Mahesan Niranjan,et al.  Neural networks and radial basis functions in classifying static speech patterns , 1990 .

[17]  Yoshua Bengio,et al.  Learning the dynamic nature of speech with back-propagation for sequences , 1992, Pattern Recognit. Lett..

[18]  Patrick Gallinari,et al.  Learning vector quantization, multi layer perceptron and dynamic programming: comparison and cooperation , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[19]  Hervé Bourlard,et al.  Continuous speech recognition using multilayer perceptrons with hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[20]  Alex Waibel,et al.  Continuous speech recognition using linked predictive neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[21]  Alexander H. Waibel,et al.  Connectionist Architectures for Multi-Speaker Phoneme Recognition , 1989, NIPS.

[22]  Alexander H. Waibel,et al.  Multi-State Time Delay Networks for Continuous Speech Recognition , 1991, NIPS.

[23]  Michael J. Carey,et al.  A speaker verification system using alpha-nets , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[24]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[25]  Richard J. Mammone,et al.  Speaker recognition using neural networks and conventional classifiers , 1994, IEEE Trans. Speech Audio Process..

[26]  A. Waibel,et al.  Connectionist Viterbi training: a new hybrid method for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[27]  Alex Waibel,et al.  Integrating time alignment and neural networks for high performance continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[28]  Hiroaki Hattori,et al.  Text-independent speaker recognition using neural networks , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Abdelhamid Mellouk,et al.  A discriminative neural prediction system for speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  Younès Bennani,et al.  Text-independent talker identification system combining connectionist and conventional models , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[31]  Stephen A. Zahorian,et al.  Text-independent talker identification with neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[32]  Y Bennani Multi-Expert and Hybrid Connectionist Approach for Pattern Recognition: Speaker Identification Task , 1994, Int. J. Neural Syst..

[33]  J. Oglesby,et al.  Radial basis function networks for speaker recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[34]  Michael I. Jordan,et al.  Learning piecewise control strategies in a modular neural network architecture , 1993, IEEE Trans. Syst. Man Cybern..

[35]  Ken-ichi Iso,et al.  Large vocabulary speech recognition using neural prediction model , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[36]  Ah Chung Tsoi,et al.  Locally recurrent globally feedforward networks: a critical review of architectures , 1994, IEEE Trans. Neural Networks.

[37]  Yoshua Bengio,et al.  Global optimization of a neural network-hidden Markov model hybrid , 1992, IEEE Trans. Neural Networks.

[38]  Steve Renals,et al.  Connectionist probability estimation in the DECIPHER speech recognition system , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[39]  W. Fisher,et al.  An acoustic‐phonetic data base , 1987 .

[40]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[41]  John S. Bridle,et al.  Alpha-nets: A recurrent 'neural' network architecture with a hidden Markov model interpretation , 1990, Speech Commun..

[42]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[43]  Ken-ichi Iso,et al.  Speaker-independent word recognition using a neural prediction model , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[44]  J. Oglesby,et al.  Optimisation of neural models for speaker identification , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[45]  R. Palmer,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[46]  Helge Bjarup Dissing Sørensen,et al.  Pi-sigma and hidden control based self-structuring models for text-independent speaker recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[47]  Jay M. Naik,et al.  A hybrid HMM-MLP speaker verification algorithm for telephone speech , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[48]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .