Markov modeling of continuous parameters in speech recognition
暂无分享,去创建一个
This paper presents how to avoid the labelling part of a speech recognition strategy based on hidden Markov models, while keeping a stochastic formulation. After a brief recall of how a Markov model can be used for speech recognition, we propose another formulation, in which the labels are suppressed, dealing only with continuous parameters. The notion of speech generator is then introduced, and the formulas for speech training as well as decoding are rewritten. This new formulation leads to the fact that the probability densitiesp(x | G), whereGis a generator, andxan acoustic vector, must be estimated. We explain our choice of non-parametric methods, using Parzen estimators. Those estimators require a kernel function, which we choose in a simple manner, and the value for the radius of the kernel, which is the key problem. Successively statistical solution, information theory solution, and an original topological solution are presented, the last being retained. We finally present the results of an application of this model to a 5000 words speech recognition system. The results showed that one can decrease the error-rate, by switching from a simple labelling scheme to this continuous parameter model.
[1] T. Cacoullos. Estimation of a multivariate density , 1966 .
[2] Larry D. Hostetler,et al. Optimization of k nearest neighbor density estimates , 1973, IEEE Trans. Inf. Theory.
[3] E. Parzen. On Estimation of a Probability Density Function and Mode , 1962 .
[4] Lalit R. Bahl,et al. A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.