The ups and downs of Hebb synapses.

Abstract Modelers have come up with many different learning rules for neural networks. When a teacher specifies the correct output, error-driven rules work better than pure Hebb rules in which the changes in synapse strength depend on the correlation between pre and postsynaptic activities. But for unsupervised learning, Hebb rules can be very effective if they are combined with suitable normalization or "unlearning" terms to prevent the synapses growing without bound. Hebb rules that use rates of change of activity instead of activity itself are useful for discovering perceptual invariants and may also provide a way of implementing error-driven learning. It would be truly wonderful if randomly connected neural networks could turn themselves into useful computing devices by using some simple rule to modify the strengths of synapses. This was the hope that lay behind the original Hebb learning rule and it is the vision that has driven neural network modelers for half a century. Initially, researchers tried simulating various rules to see what would happen. After a decade or two of messing around, researchers realized that there was a much better way to explore the space of possible learning rules: First write down an objective function (a quantitative definition of how well the network is performing) and then use elementary calculus to derive a learning rule that will improve the objective function. For the last few decades, the big theoretical advances in learning rules for neural networks have been associated with new optimization methods and new ideas about what objective function should be optimized. If we think of a neural network as a device for converting input vectors into output vectors, it is obvious that one sensible objective is to minimize some measure of the difference between the output the network actually produces and the output it ought to produce. This approach led to effective "error-driven" learning rules such as the Widrow-Hoff rule (Widrow & Hoff, 1960) and the perceptron convergence procedure (Rosenblatt, 1961) and it was later generalized to multilayer networks by using backpropagation of the errors to get training signals for intermediate "hidden" layers (Rumelhart, Hinton, & Williams, 1986). Within the neural network community, the "Hebbian" approach of using the product of pre and postsynaptic activities to drive learning was seen as inferior to error-- driven methods that use the product of the presynaptic activity and the postsynaptic activity derivative - the rate at which the objective function changes as the postsynaptic activity is changed. Even when the task was merely to associate random input vectors with random output vectors, it was shown that an error-driven rule worked much better than a Hebbian rule. Unfortunately, error-driven learning has some serious drawbacks. It requires a teacher to specify the right answer and it is hard to see how neurons could implement the backpropagation required by multilayer versions. It is possible to get the teaching signal from the data itself by trying to predict the next term in a temporal sequence (Elman, 1991) or by trying to reconstruct the input data at the output (Hinton, 1989) but it is also possible to use quite different objective functions for learning. Some of these alternative objective functions lead to learning rules that are far more Hebbian in flavour. A common objective in processing high-dimensional data is to reduce the dimensionality without losing the ability to reconstruct the raw data from the reduced representation. If we measure the accuracy of the reconstruction by the squared error, the optimal strategy is to extract the principal components - the dominant directions of variation in the data. Oja (1982) showed how to extract the first principal component using Hebbian learning to maximize the squared output of a neuron combined with normalization of the synapse strengths to prevent them growing without bound. …