Unsupervised Learning: Foundations of Neural Computation--A Review

ral networks in the 1980s was primarily fueled by supervised learning, exemplified by the backpropagation algorithm. In supervised learning, a desired output signal is provided to the learner together with an input signal, and the system adjusts parameters so that its response in the future will be closer to the desired signal. Although supervised learning has been dominant in machine learning, much of our intelligence, in particular, perception, is acquired without a teacher. Through mere exposure, humans and animals learn how to analyze their environments and recognize relevant objects and events. For example, consider our experience of sorting out apples from oranges by their appearances, an ability that can be gained before naming them. This analysis calls for unsupervised learning—learning without a teacher, also known as self-organization. Unsupervised learning has been studied in neural networks since the early days. However, in recent years, there has been a steady shift in the research focus from supervised learning to unsupervised learning, and the latter now becomes a predominant subject in neural networks. Unsupervised Learning: Foundations of Neural Computation is a collection of 21 papers published in the journal Neural Computation in the 10-year period since its founding in 1989 by Terrence Sejnowski. Neural Computation has become the leading journal of its kind. The editors of the book are Geoffrey Hinton and Terrence Sejnowski, two pioneers in neugiving external instruction? There is no simple answer to this critical question. In fact, many different objectives have been proposed, including to discover clusters in the input data, extract features that characterize the input data more compactly, and uncover nonaccidental coincidences within the input data. Beneath these objectives is the fundamental task of representation: Unsupervised learning attempts to derive hidden structure from the raw data. This endeavor is meaningful because input data are far from random; they are produced by physical processes. For example, a picture taken by a camera reflects the luminance of physical objects that constitute the visual scene, and an audio recording reflects acoustic events in the auditory scene. Physical processes tend to be coherent; an object occupies a connected region of the space, has a smooth surface, moves continuously, and so on. From the information theory standpoint, physical objects and events tend to have limited complexity and can be described in a small number of bits. This observation is, in my view, the foundation of unsupervised learning. Because perception is concerned with recovering the physical causes of the input data, a better representation should reveal more of the underlying physical causes. Physical causes are hidden in the data, and they could, in principle, be revealed by unsupervised learning. However, there is an enormous variety of physical causes; trees have different colors, have textures, leave patterns, and so on, and they all look very different from animals. Without external supervision, the best unsupervised learning can achieve is to uncover generic structure that exists in a variety of physical causes. Fortunately, guided by some general assumptions or principles, there are plenty of interesting problems to solve. One general principle for unsupervised learning is minimum entropy proposed in Barlow’s article. The idea is that the derived representation should minimize redundancy (correlation) contained in the input data. The goal is similar to that pursued in communication theory: to minimize the bandwidth needed for signal transmission. Closely associated is the minimum–description length principle advocated in the Zemel and Hinton article on learning population codes. Another principle, put forward in Field’s article, is sparse coding: The goal of the representation is to minimize the number of units in a distributed network that are activated by