Teaching Feed-Forward Neural Networks by Simulated Annealing

Abstract. Simulated annealing is applied to the problem of teaching feed-forward neural networks with discrete-valued weights. Networkperformance is optimized by repeated presentation of training data at lower and lower temperatures. Several examples, including the parity and "clump-recognition" problems are treated, scaling with networkcomplexity is discussed, and the viability of mean-fieldapproximationsto the annealing process is considered.1. Introduction Back propagation [1] and related techniques have focused attention on theprospect of effective learning by feed-forward neural networks with hiddenlayers. Most current teaching methods suffer from one of several problems,including the tendency to get stuck in local minima, and poor performance inlarge-scale examples. In addition, gradient-descent methods are applicableonly when the weights can assume a continuum of values. The solut ionsreached by back propagation are sometimes only marginally stable agai nstperturbations, and rounding off weights after or during the procedure canseverely affect network performance. If the weights are restricted to a fewdiscrete values, an alternative is required.At first sight, such a restriction seems counterproductive. Why decreasethe flexibility of the network any more than necessary? One answer is thattruly tunable analog weights are still a bit beyond the capabilities of currentVLSI technology. New techniques will no doubt be developed, but there areother more fundamental reasons to prefer discrete-weight networks. Appli­cation of back propagation often results in weights that vary greatly, mustbe preci sely specified, and embody no parti cular pattern. Ifthe network isto incorporate structured rules underlying the examples it has learned, theweights ought often to assume regular, integer values. Examples of such verystructured sets of weights include the "human solution" to the "clump recog­nition" problem [2] discussed in section 2.2, and the configuration presentedin reference [1] that solves the parity problem. T he relatively low number of

[1]  Physical Review , 1965, Nature.

[2]  Physics Letters , 1962, Nature.

[3]  H. Kalmus Biological Cybernetics , 1972, Nature.