A Markov random field based multi-band model

An extension of the multi-band model including inter-band control of time asynchrony is described. It is based on the framework of Markov random fields. The law of the speech process is given by a parametric Gibbs distribution and a maximum likelihood parameter estimation algorithm is developed. This random field model is applied to isolated word recognition. It is shown that similar performances are obtained with the new model and with standard HMM techniques in the mono-band case. In the multi-band case, it is shown that the recognition rate decreases when the number of bands is increased but that modeling inter-band synchrony limits the performance decrease.

[1]  Roger K. Moore,et al.  Modelling asynchrony in speech using elementary single-signal decomposition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Gérard Chollet,et al.  Markov Random Field Modeling for Speech Recognition , 1998 .

[4]  Les Atlas,et al.  Application of the Gibbs distribution to hidden Markov modeling in speaker independent isolated word recognition , 1991, IEEE Trans. Signal Process..

[5]  Hans-Otto Georgii,et al.  Gibbs Measures and Phase Transitions , 1988 .

[6]  K. Lange A gradient algorithm locally equivalent to the EM algorithm , 1995 .

[7]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[8]  Bernard Chalmond,et al.  An iterative Gibbsian technique for reconstruction of m-ary images , 1989, Pattern Recognit..

[9]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[10]  Hideki Noda,et al.  A MRF-based parallel processing algorithm for speech recognition using linear predictive HMM , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[12]  Gérard Chollet,et al.  Swiss French PolyPhone and PolyVar: telephone speech databases to model inter- and intra-speaker variability , 1996 .

[13]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[14]  Gérard Chollet,et al.  Toward Markov random field modeling of speech , 1998, ICSLP.