Improvement of Learning in Recurrent Networks by Substituting the Sigmoid Activation Function

Several recurrent network architectures have been devised in recent years to deal with sequential tasks. One such model is the Simple Recurrent Network (SRN) proposed by Elman (Elman, 1988). The backpropagation rule was employed for learning in the former published works with SRNs, e.g. (Cleeremans et al., 1989). Later on, full gradient learning schemes, such as RTRL and BPTT, have been proposed for learning in fully-connected recurrent networks. These algorithms can also be used to train the weights of the recurrent hidden layer in SRNs.