Recurrent neural networks, hidden Markov models and stochastic grammars

A discussion is presented of the advantage of using a linear recurrent network to encode and recognize sequential data. The hidden Markov model (HMM) is shown to be a special case of such linear recurrent second-order neural networks. The Baum-Welch reestimation formula, which has proved very useful in training HMM, can also be used to learn a linear recurrent network. As an example, a network has successfully learned the stochastic Reber grammar with only a few hundred sample strings in about 14 iterations. The relative merits and limitations of the Baum-Welch optimal ascent algorithm in comparison with the error correction-gradient descent-learning algorithm are discussed