A minimum discrimination information approach for hidden Markov modeling

An iterative approach for minimum-discrimination-information (MDI) hidden Markov modeling of information sources is proposed. The approach is developed for sources characterized by a given set of partial covariance matrices and for hidden Markov models (HMMs) with Gaussian autoregressive output probability distributions (PDs). The approach aims at estimating the HMM which yields the MDI with respect to all sources that could have produced the given set of partial covariance matrices. Each iteration of the MDI algorithm generates a new HMM as follows. First, a PD for the source is estimated by minimizing the discrimination information measure with respect to the old model over all PDs which satisfy the given set of partial covariance matrices. Then a new model that decreases the discrimination information measure between the estimated PD of the source and the PD of the old model is developed. The problem of estimating the PD of the source is formulated as a standard constrained minimization problem in the Euclidean space. The estimation of a new model given the PD of the source is done by a procedure that generalizes the Baum algorithm. The MDI approach is shown to be a descent algorithm for the discrimination information measure, and its local convergence is proved. >

[1]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[2]  Biing-Hwang Juang,et al.  Mixture autoregressive hidden Markov models for speech signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[3]  Robert M. Gray,et al.  Rate-distortion speech coding with a minimum discrimination information distortion measure , 1981, IEEE Trans. Inf. Theory.

[4]  R. Gray,et al.  Distortion measures for speech processing , 1980 .

[5]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[6]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[7]  J. Shore Minimum cross-entropy spectral analysis , 1981 .

[8]  Amir Dembo,et al.  The relation between maximum likelihood estimation of structured covariance matrices and periodograms , 1986, IEEE Trans. Acoust. Speech Signal Process..

[9]  Rodney W. Johnson,et al.  Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy , 1980, IEEE Trans. Inf. Theory.

[10]  Robert M. Gray,et al.  A unified approach for encoding clean and noisy sources by means of waveform and autoregressive model vector quantization , 1988, IEEE Trans. Inf. Theory.

[11]  Lawrence R. Rabiner,et al.  On the relations between modeling approaches for speech recognition , 1990, IEEE Trans. Inf. Theory.

[12]  A. B. Poritz,et al.  Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[13]  J. Shore,et al.  Minimum cross-entropy spectral analysis of multiple signals , 1983 .

[14]  J. Shore On a relation between maximum likelihood classification and minimum relative-entropy classification , 1984, IEEE Trans. Inf. Theory.

[15]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[18]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[19]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[20]  Abraham Charnes,et al.  Computation of minimum cross entropy spectral estimates: An unconstrained dual convex programming method , 1986, IEEE Trans. Inf. Theory.

[21]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Robert M. Gray,et al.  Asymptotic minimum discrimination information measure for asymptotically weakly stationary processes , 1988, IEEE Trans. Inf. Theory.

[23]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[24]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  M. Kupperman PROBABILITIES OF HYPOTHESES AND INFORMATION-STATISTICS IN SAMPLING FROM EXPONENTIAL-CLASS POPULATIONS , 1958 .

[26]  D. Luenberger,et al.  Estimation of structured covariance matrices , 1982, Proceedings of the IEEE.

[27]  Biing-Hwang Juang,et al.  On the application of hidden Markov models for enhancing noisy speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[28]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[29]  A. Poritz,et al.  Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[30]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[31]  K. Dzhaparidze Parameter estimation and hypothesis testing in spectral analysis of stationary time series , 1986 .

[32]  Lawrence R. Rabiner,et al.  A segmental k-means training procedure for connected word recognition , 1986, AT&T Technical Journal.

[33]  Robert M. Gray,et al.  Global convergence and empirical consistency of the generalized Lloyd algorithm , 1986, IEEE Trans. Inf. Theory.