Markov modeling of continuous parameters in speech recognition

This paper presents how to avoid the labelling part of a speech recognition strategy based on hidden Markov models, while keeping a stochastic formulation. After a brief recall of how a Markov model can be used for speech recognition, we propose another formulation, in which the labels are suppressed, dealing only with continuous parameters. The notion of speech generator is then introduced, and the formulas for speech training as well as decoding are rewritten. This new formulation leads to the fact that the probability densitiesp(x | G), whereGis a generator, andxan acoustic vector, must be estimated. We explain our choice of non-parametric methods, using Parzen estimators. Those estimators require a kernel function, which we choose in a simple manner, and the value for the radius of the kernel, which is the key problem. Successively statistical solution, information theory solution, and an original topological solution are presented, the last being retained. We finally present the results of an application of this model to a 5000 words speech recognition system. The results showed that one can decrease the error-rate, by switching from a simple labelling scheme to this continuous parameter model.