Empirical Demonstration: Combining EM and RVFL

The GM-RVFL network and the EM training scheme derived in the previous chapter are applied to the stochastic time series of sections 4.2 and 4.3. The prediction performance is found to depend critically on the distribution width for the random weights, with too small a value making the network incapable of learning the non-linearities of the time series, and too large a value degrading the performance due to excessive non-linearity and overfitting. However, the training process is accelerated by about two orders of magnitude, which allows the training of a whole ensemble of networks at the same computational costs as required otherwise for training a single model. In this way, ‘good’ values for the distribution width can easily be obtained by a discrete random search. Combining the best models in a committee leads to an improvement of the generalisation performance over that obtained with an individual fully-adaptable model. For the double-well time series of Section 4.3, a committee of GM-RVFL networks is found to outperform all alternative models otherwise applied to this problem.