This paper describes the development of a SpeakerIndependent Isolated Words recognizer for a voice dialing application operating in the car environment. Speaker dependent and speaker independent approaches are addressed and compared. Simple Continuous Hidden Markov Models are used for speaker dependent recognition, while multiple codebook Discrete and Continuous Hidden Markov Models are trained by speaker independent reference data derived from a large database of speech collected inside several cars under a wide variety of driving conditions and by a large number of speakers from different Italian regions. By modeling separately two models (one for male and one for female speakers) for each word with 12 state Continuous density whole word HMMs with 8 diagonal covariance Gaussians per state, and performing a beam search Viterbi decoding a recognition rate of 99% has been obtained (65 errors out of 6423 words).
[1]
L. Baum,et al.
An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology
,
1967
.
[2]
Robert M. Gray,et al.
An Algorithm for Vector Quantizer Design
,
1980,
IEEE Trans. Commun..
[3]
Richard P. Lippmann,et al.
An introduction to computing with neural nets
,
1987
.
[4]
Xuedong Huang,et al.
On semi-continuous hidden Markov modeling
,
1990,
International Conference on Acoustics, Speech, and Signal Processing.
[5]
L. R. Rabiner,et al.
An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition
,
1983,
The Bell System Technical Journal.