This paper deals with optimal learning and provides a uniied viewpoint of most signiicant results in the eld. The focus is on the problem of local minima in the cost function that is likely to aaect more or less any learning algorithm. We give some intriguing links between optimal learning and the computational complexity of loading problems. We exhibit a computational model such that the solution of all loading problems giving rise to unimodal error functions require the same time, thus suggesting that they belong to the same computational class. 1 Learning as optimisation Supervised learning in multilayered networks (MLNs) can be accomplished thanks to Backpropagation (BP), which is used to minimise pattern misclassiications by means of gradient descent for a particular nonlinear least squares tting problem. Unfortunately, BP is likely to be trapped in local minima and indeed many examples of local extremes have been reported in the literature. The presence of local minima derives essentially from two diierent reasons. First, they may arise because of an unsuitable joint choice of the functions which deenes the network dynamics and the error function. Second, local minima may be inherently related to the structure of the problem at hand. In 5], these two cases have been referred to as spurious and structural local minima, respectively. Problems of sub-optimal solutions may also arise when learning with high initial weights, as a sort of premature neuron saturation arises, which is strictly related to the neuron fan-in. An interesting way of facing this problem is to use the \relative cross-entropy metric " 10], for which the erroneous saturation of the output neurons does not lead to plateaux, but to very high values of the
[1]
Paolo Frasconi,et al.
Optimal learning in artificial neural networks: A theoretical view
,
1998
.
[2]
Paolo Frasconi,et al.
Learning without local minima in radial basis function networks
,
1995,
IEEE Trans. Neural Networks.
[3]
Paolo Frasconi,et al.
Learning in multilayered networks used as autoassociators
,
1995,
IEEE Trans. Neural Networks.
[4]
Marco Gori,et al.
Does Terminal Attractor Backpropagation Guarantee Global Optimization
,
1994
.
[5]
Paolo Frasconi,et al.
Multilayered networks and the C-G uncertainty principle
,
1993,
Defense, Security, and Sensing.
[6]
Bedri C. Cetin,et al.
Terminal repeller unconstrained subenergy tunneling (trust) for fast global optimization
,
1993
.
[7]
Xiao-Hu Yu,et al.
Can backpropagation error surface not have local minima
,
1992,
IEEE Trans. Neural Networks.
[8]
Alberto Tesi,et al.
On the Problem of Local Minima in Backpropagation
,
1992,
IEEE Trans. Pattern Anal. Mach. Intell..
[9]
Jinhui Chao,et al.
How to find global minima in finite times of search for multilayer perceptrons training
,
1991,
[Proceedings] 1991 IEEE International Joint Conference on Neural Networks.
[10]
Ching-Chi Hsu,et al.
Terminal attractor learning algorithms for back propagation neural networks
,
1991,
[Proceedings] 1991 IEEE International Joint Conference on Neural Networks.