Input/Output HMMs: A Recurrent Bayesian Network View

This paper reviews Markovian models for sequence processing tasks, with particular emphasis on input/output hidden Markov models (IOHMMs) for supervised learning on temporal domains. HMMs and IOHMMs are viewed as special cases of belief networks that might be called recurrent Bayesian networks. This view opens the way to more general structures that could be devised for learning probabilistic relationships among sets of data streams (instead of just input and output data streams) or that might exploit multiple hidden state variables. Introducing the concept of belief network unfolding it is shown that recurrent Bayesian networks operating on discrete domains are equivalent to recurrent neural networks with higher order connections and linear units.

[1]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[2]  Yoshua Bengio,et al.  Diffusion of Context and Credit Information in Markovian Models , 1995, J. Artif. Intell. Res..

[3]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[4]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[5]  Wray L. Buntine A Guide to the Literature on Learning Probabilistic Networks from Data , 1996, IEEE Trans. Knowl. Data Eng..

[6]  A. Poritz,et al.  Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[7]  Michael I. Jordan,et al.  Probabilistic Independence Networks for Hidden Markov Probability Models , 1997, Neural Computation.

[8]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[9]  Kristian G. Olesen,et al.  HUGIN - A Shell for Building Bayesian Belief Universes for Expert Systems , 1989, IJCAI.

[10]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[11]  Giovanni Soda,et al.  Recurrent neural networks and prior knowledge for sequence processing: a constrained nondeterministic approach , 1995, Knowl. Based Syst..

[12]  Ross D. Shachter Evaluating Influence Diagrams , 1986, Oper. Res..

[13]  John A. Nelder,et al.  Generalized linear models. 2nd ed. , 1993 .

[14]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Steven J. Nowlan,et al.  Mixtures of Controllers for Jump Linear and Non-Linear Plants , 1993, NIPS.

[17]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[18]  C. Lee Giles,et al.  Constructing deterministic finite-state automata in recurrent neural networks , 1996, JACM.

[19]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[20]  Wray Buntine,et al.  A Guide to the Literature on Learning Graphical Models , 1994 .

[21]  Pierre Baldi,et al.  Hidden Markov Models in Molecular Biology: New Algorithms and Applications , 1992, NIPS.

[22]  Yoshua Bengio,et al.  An EM approach to grammatical inference: input/output HMMs , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[23]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[24]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[25]  Hava T. Siegelmann,et al.  The complexity of language recognition by neural networks , 1992, Neurocomputing.

[26]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[27]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[28]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[29]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[30]  Carl H. Smith,et al.  Inductive Inference: Theory and Methods , 1983, CSUR.

[31]  Michael I. Jordan Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .

[32]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[33]  J. N. R. Jeffers,et al.  Graphical Models in Applied Multivariate Statistics. , 1990 .

[34]  Yoshua Bengio,et al.  Input-output HMMs for sequence processing , 1996, IEEE Trans. Neural Networks.