The Physical Correlates of Local Minima

The training of various supervised neural-net classifiers is accomplished by optimization of a criterion function. For reasons of efficiency a local optimizer is usually employed for this purpose. Thus, the training of neural-net classifiers is often hampered by the occurrence of local minima, which results in the attainment of inferior classification performance. We study the following problem: which physical states of the classifier tend to correspond with local minima in the criterion function? Such an understanding of the physical correlates of local minima is important, since it may be utilized to choose the weights from which training is initiated in a more sensible manner. Specifically, it may be possible to decrease the probability of arriving at a local minimum. For the particular case of backpropagation classifiers, we show that the occurrence of a local minimum in the criterion function can often be related to specific patterns of defects in the classifier. In particular, three main causes for local minima are identified: the straying of hidden nodes so that they are either strongly active or very inactive for all training samples; duplication of function by pairs of hidden nodes; and arrangements of hidden neurons so that they are all highly inactive in certain regions of feature space. These are the most common causes of local minima, but certain other types of local minima are also shown to exist.