Statistical methods in multi‐speaker automatic speech recognition

Automatic speech recognition and understanding (ASR) plays an important role in the framework of man-machine communication. Substantial industrial developments are at present in progress in this area. However, after 40 years or so of efforts several fundamental questions remain open. This paper is concerned with a comparative study of four different methods for multi-speaker word recognition: (i) clustering of acoustic templates, (ii) comparison with a finite state automaton, (iii) dynamic programming and vector quantization, (iv) stochastic Markov sources. In order to make things comparable, the four methods were tested with the same material made up of the ten digits (0 to 9) pronounced four times by 60 different speakers (30 males and 30 females). We will distinguish in our experiments between multi-speaker systems (capable of recognizing words pronounced by speakers that have been used during the training phase of the system) and speaker-independent systems (capable of recognizing words pronounced by speakers totally unknown to the system). Half of the corpus (15 male and 15 female) were used for training, and the remaining part for test.

[1]  John E. Shore,et al.  Parameter selection for isolated word recognition using vector quantization , 1984, ICASSP.

[2]  R. Sokal,et al.  Principles of numerical taxonomy , 1965 .

[3]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[4]  L. R. Rabiner,et al.  On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition , 1983, The Bell System Technical Journal.

[5]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[6]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[7]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[8]  Aaron E. Rosenberg,et al.  Speaker-independent recognition of isolated words using clustering techniques , 1979 .

[9]  F. Guyot,et al.  Toward a continuous model of the cortical column: Application to speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[10]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[11]  Pascal Divoux Mimule : un système de reconnaissance de mots isolés multilocuteurs utilisant les techniques de classification , 1988 .

[12]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[13]  James K. Baker,et al.  Stochastic modeling for automatic speech understanding , 1990 .