Products of Hidden Markov Models

We present products of hidden Markov models (PoHMM's), a way of combining HMM's to form a distributed state time series model. Inference in a PoHMM is tractable and eAEcient. Learning of the parameters, although intractable, can be e ectively done using the Product of Experts learning rule. The distributed state helps the model to explain data which has multiple causes, and the fact that each model need only explain part of the data means a PoHMM can capture longer range structure than an HMM is capable of. We show some results on modelling character strings, a simple language task and the symbolic family trees problem, which highlight these advantages.

[1]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[2]  E. Hilgard The Experience of Hypnosis , 1968 .

[3]  Annabel Faraday The Dream Game , 1975 .

[4]  Robert P. Abelson,et al.  CONCEPTS FOR REPRESENTING MUNDANE REALITY IN PLANS , 1975 .

[5]  MARCELO DASCAL,et al.  BETWEEN SEMANTICS AND PRAGMATICS: THE TWO TYPES OF ‘BUT’— HEBREW ‘AVAL’ AND ‘ELA’ , 1977 .

[6]  J. Sadock On Testing for Conversational Implicature , 1978 .

[7]  Stephen C. Levinson,et al.  The essential inadequacies of speech act models of dialogue , 1981 .

[8]  P. Pollard,et al.  The effects of prior beliefs in reasoning: An associational interpretation , 1981 .

[9]  P. Gildea,et al.  On understanding nonliteral speech: Can people ignore metaphors? , 1982 .

[10]  R. Gibbs Do people always process the literal meanings of indirect requests , 1983 .

[11]  RAYMOND W. GIBBS,et al.  Literal Meaning and Psychological Theory , 1984, Cogn. Sci..

[12]  Marcelo Dascal,et al.  What Do Indicating Devices Indicate , 1984 .

[13]  M. Dascal Language use in jokes and dreams: Sociopragmatics vs psychopragmatics , 1985 .

[14]  Geoffrey E. Hinton,et al.  Learning distributed representations of concepts. , 1989 .

[15]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[16]  Geoffrey E. Hinton,et al.  Mean field networks that learn to discriminate temporally distorted strings , 1991 .

[17]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[18]  Yoshua Bengio,et al.  Diffusion of Context and Credit Information in Markovian Models , 1995, J. Artif. Intell. Res..

[19]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[20]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[21]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.