Optimization in an Error Backpropagation Neural Network Environment with a Performance Test on a Spectral Pattern Classification Problem

This paper attempts to develop a mathematically rigid framework for minimising the cross-entropy function in an error backpropagating framework. In doing so, we derive the backpropagation formulae for evaluating the partial derivatives in a computationally efficient way. Various techniques of optimising the multiple-class cross-entropy error function to train single hidden layer neural network classifiers with softmax output transfer functions are investigated on a real world multispectral pixel-by-pixel classification problem that is of fundamental importance in remote sensing. These techniques include epoch-based and batch versions of backpropagation of gradient descent, PR-conjugate gradient, and BFGS quasi-Newton errors. The method of choice depends upon the nature of the learning task and whether one wants to optimise learning for speed or classification performance. It was found that, comparatively considered, gradient descent error backpropagation provided the best and most stable out-of-sample performance results across batch and epoch-based modes of operation. If the goal is to maximise learning speed and a sacrifice in classification accuracy is acceptable, then PR-conjugate gradient error backpropagation tends to be superior. If the training set is very large, stochastic epoch-based versions of local optimisers should be chosen utilising a larger rather than a smaller epoch size to avoid unacceptable instabilities in the classification results.

[1]  W. B. Yates,et al.  Classification of remotely sensed data by an artificial neural network: issues related to training data characteristics , 1995 .

[2]  Christophe Proisy,et al.  Monitoring seasonal changes of a mixed temperate forest using ERS SAR observations , 2000, IEEE Trans. Geosci. Remote. Sens..

[3]  I. Kanellopoulos,et al.  Land-cover discrimination in SPOT HRV imagery using an artificial neural network - a 20-class experiment , 1992 .

[4]  David J. Maguire,et al.  Geographical Information Systems , 1993 .

[5]  M. Møller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1990 .

[6]  Robert A. Schowengerdt,et al.  A review and analysis of backpropagation neural networks for classification of remotely-sensed multi-spectral imagery , 1995 .

[7]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[8]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[9]  George F. Hepner,et al.  Artificial neural network classification using a minimal training set - Comparison to conventional supervised classification , 1990 .

[10]  Andrzej Cichocki,et al.  Neural networks for optimization and signal processing , 1993 .

[11]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[12]  Françoise Fogelman-Soulié,et al.  Neurocomputing : algorithms, architectures and applications , 1990 .

[13]  F. Roli,et al.  Multisource Classification of Complex Rural Areas by Statistical and Neural-Network Approaches , 1997 .

[14]  Kun Shan Chen,et al.  LAND-COVER CLASSIFICATION OF MULTISPECTRAL IMAGERY USING A DYNAMIC LEARNING NEURAL-NETWORK , 1995 .

[15]  J. D. Paola,et al.  The Effect of Neural-Network Structure on a Multispectral Land-Use/Land-Cover Classification , 1997 .

[16]  Giles M. Foody,et al.  Land Cover Classification by an Artificial Neural Network with Ancillary Information , 1995, Int. J. Geogr. Inf. Sci..

[17]  Manfred M. Fischer,et al.  Evaluation of Neural Pattern Classifiers for a Remote Sensing Application , 1995 .

[18]  Daniel L. Civco,et al.  Artificial Neural Networks for Land-Cover Classification and Mapping , 1993, Int. J. Geogr. Inf. Sci..

[19]  Sigeru Omatu,et al.  Neural network approach to land cover mapping , 1994, IEEE Trans. Geosci. Remote. Sens..

[20]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[21]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[22]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[23]  James L. McClelland,et al.  James L. McClelland, David Rumelhart and the PDP Research Group, Parallel distributed processing: explorations in the microstructure of cognition . Vol. 1. Foundations . Vol. 2. Psychological and biological models . Cambridge MA: M.I.T. Press, 1987. , 1989, Journal of Child Language.

[24]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[25]  B. Turner,et al.  Performance of a neural network: mapping forests using GIS and remotely sensed data , 1997 .

[26]  Roberto Battiti,et al.  Learning with first, second, and no derivatives: A case study in high energy physics , 1994, Neurocomputing.

[27]  Geoffrey E. Hinton,et al.  Proceedings of the 1988 Connectionist Models Summer School , 1989 .

[28]  P. Swain,et al.  Neural Network Approaches Versus Statistical Methods In Classification Of Multisource Remote Sensing Data , 1990 .

[29]  Jon Atli Benediktsson,et al.  Conjugate-gradient neural networks in classification of multisource and very-high-dimensional remote sensing data , 1993 .

[30]  David F. Shanno,et al.  Recent advances in numerical techniques for large scale optimization , 1990 .

[31]  Horst Bischof,et al.  Multispectral classification of Landsat-images using neural networks , 1992, IEEE Trans. Geosci. Remote. Sens..

[32]  G. O. Moe,et al.  Multispectral image-processing with a three-layer backpropagation network , 1989, International 1989 Joint Conference on Neural Networks.

[33]  Wolfram Schiffmann,et al.  Comparison of optimized backpropagation algorithms , 1993, ESANN.

[34]  David F. Shanno,et al.  Conjugate Gradient Methods with Inexact Searches , 1978, Math. Oper. Res..

[35]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[36]  P. D. Heermann,et al.  Classification of multispectral remote sensing data using a back-propagation neural network , 1992, IEEE Trans. Geosci. Remote. Sens..

[37]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .