Probabilistic neural network models for sequential data

Artificial neural networks (ANN) can be incorporated into probabilistic models. In this paper we review some of the approaches which have been proposed to incorporate them into probabilistic models of sequential data, such as hidden Markov models (HMM). We also discuss new developments and new ideas in this area, in particular how ANN can be used to model high-dimensional discrete and continuous data to deal with the curse of dimensionality and how the ideas proposed in these models could be applied to statistical language modeling to represent longer-term context than allowed by trigram models, while keeping word-order information.

[1]  Isabelle Guyon,et al.  Comparison of classifier methods: a case study in handwritten digit recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[2]  Jack Perkins,et al.  Pattern recognition in practice , 1980 .

[3]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[4]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[5]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[6]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[7]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[8]  Steven J. Nowlan,et al.  Mixtures of Controllers for Jump Linear and Non-Linear Plants , 1993, NIPS.

[9]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[10]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[11]  Samy Bengio,et al.  Taking on the curse of dimensionality in joint distributions using neural networks , 2000, IEEE Trans. Neural Networks Learn. Syst..

[12]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[13]  Samy Bengio,et al.  An EM Algorithm for Asynchronous Input/Output Hidden Markov Models , 1996 .

[14]  H. Schütze,et al.  Dimensions of meaning , 1992, Supercomputing '92.

[15]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[16]  Risto Miikkulainen,et al.  Natural Language Processingwith Modular Neural Networks and Distributed Lexicon , 1991 .

[17]  Risto Miikkulainen,et al.  Natural Language Processing With Modular PDP Networks and Distributed Lexicon , 1991, Cogn. Sci..

[18]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[19]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[20]  Michael I. Jordan,et al.  Learning Fine Motion by Markov Mixtures of Experts , 1995, NIPS.