Learning in massively parallel nets

The human brain is very different from a conventional digital computer. It relies on massive parallelism rather than raw speed and it stores long-term knowledge by modifying the way its processing elements interact rather than by setting bits in a passive, general purpose memory. It is robust against minor physical damage and it learns from experience instead of being explicitly programmed. We do not yet know how the brain uses the activities of neurons to represent complex, articulated structures, or how the perceptual system turns the raw input into useful internal representations so rapidly. Nor do we know how the brain learns new representational schemes. But over the past few years there have been a lot of new and interesting theories about these issues. Much of the theorizing has been motivated by the belief that the brain is using computational principles which could also be applied to massively parallel artificial systems, if only we knew what the principles were. In the talk, I shall focus on the issue of learning. Early research on perceptrons and associative nets (or matrix memories) showed how to set the weights of the connections between input units and output units so that a pattern of activity on the input units would cause the desired pattern of activity on the output units. A variant, called the auto-associative net, did not distinguish between input and output units. It modified the weights of pairwise inter-connections among the units to ensure that any sufficiently large part of a stored pattern could recreate the rest. Recently, Hopfield has developed an interesting way of analyzing the behavior of iterative, auto-associative nets, but research on simple associative networks is generally of limited interest because most interesting tasks are too complex to be performed by auto-association or by direct connections from the input units to the output units. Many intervening layers of "hidden" units are generally required and the tough learning problem is to decide how to use these hidden units. The reason this is so difficult is that we are requiring the network to invent its own representational scheme, and the space of possible schemes is immense, even if we restrict ourselves to those that can be implemented conveniently in networks of neuron-like units.