Hidden Markov chains, the forward-backward algorithm, and initial statistics

The objects listed in the title have proven to be useful and practical modeling tools in continuous speech recognition work and elsewhere. Nevertheless, there are natural and simple situations in which the forward-backward algorithm will be inadequate for its intended purpose of finding useful maximum likelihood estimates of the parameters of the distribution of a probabilistic function of a Markov chain (a "hidden Markov model" or "Markov source model"). We observe some difficulties that arise in the case of common (e.g., Gaussian) families of conditional distributions for the observables. These difficulties are due not to the algorithm itself, but to modeling assumptions which introduce singularities into the likelihood function. We also comment on the fact that the parameters of a hidden Markov model cannot, in general, be determined, even if the distribution of the observables is completely known. We close with remarks about some effects of these modeling and estimating difficulties on practical speech recognition, and about the role of initial statistics.