Automated design of neural network architecture for classification

This Ph.D. thesis deals with finding a good architecture of a neural network classifier. The focus is on methods to improve the performance of existing architectures (i.e. architectures that are initialised by a good academic guess) and automatically building neural networks. An introduction to the Multi-Layer feed-forward neural network is given and the most essential properties for neural networks; there ability to learn from examples is discussion. Topics like traning and generalisation are treated in more explicit. On the basic of this dissuscion methods for finding a good architecture of the network described. This includes methods like; Early stopping, Cross validation, Regularisation, Pruning and various constructions algorithms (methods that successively builds a network). New ideas of combining units with different types of transfer functions like radial basis functions and sigmoid or threshold functions led to the development of a new construction algorithm for classification. The algorithm called "GLOCAL" is fully described. Results from these experiments real life data from a Synthetic Aperture Radar (SAR) are provided. The thesis was written so people from the industry and graduate students who are interested in neural networks hopeful would find it useful. Key words : Neural networks, Architectures, Training, Generalisation deductive and construction algorithms.

[1]  J. Nadal,et al.  Learning in feedforward layered networks: the tiling algorithm , 1989 .

[2]  Ferdinand Hergert,et al.  Improving model selection by nonconvergent methods , 1993, Neural Networks.

[3]  Nils J. Nilsson,et al.  Learning Machines: Foundations of Trainable Pattern-Classifying Systems , 1965 .

[4]  Bernd Fritzke,et al.  Growing cell structures--A self-organizing network for unsupervised and supervised learning , 1994, Neural Networks.

[5]  Andreas Dengel,et al.  A comparison on neural net simulators , 1993, IEEE Expert.

[6]  B. Boser,et al.  Backpropagation Learning for Multi-layer Feed-forward Neural Networks Using the Conjugate Gradient Method. Ieee Transactions on Neural Networks, 1991. [31] M. F. Mller. a Scaled Conjugate Gradient Algorithm for Fast Supervised Learning. Technical Report Pb-339 , 2007 .

[7]  Lutz Prechelt,et al.  PROBEN 1 - a set of benchmarks and benchmarking rules for neural network training algorithms , 1994 .

[8]  K. Macpherson,et al.  Generalisation in neural network time series analysis , 1994 .

[9]  D. Mackay,et al.  A Practical Bayesian Framework for Backprop Networks , 1991 .

[10]  Esther Levin,et al.  Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[11]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[12]  C. Jutten,et al.  Gal: Networks That Grow When They Learn and Shrink When They Forget , 1991 .

[13]  Marco Muselli Is Pocket algorithm optimal? , 1995, EuroCOLT.

[14]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[15]  G. J. Gibson,et al.  On the decision regions of multilayer perceptrons , 1990, Proc. IEEE.

[16]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[17]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[18]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[19]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[20]  Robert J. Schalkoff,et al.  Pattern recognition - statistical, structural and neural approaches , 1991 .

[21]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[22]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[23]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[24]  Yann LeCun,et al.  Measuring the VC-Dimension of a Learning Machine , 1994, Neural Computation.

[25]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[26]  Peter M. Todd,et al.  Designing Neural Networks using Genetic Algorithms , 1989, ICGA.

[27]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[28]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[29]  David H. Wolpert,et al.  The Mathematics of Generalization: The Proceedings of the SFI/CNLS Workshop on Formal Approaches to Supervised Learning , 1994 .

[30]  R. M. Loynes,et al.  Non-Linear Regression. , 1990 .

[31]  Vladimir Vapnik Estimations of dependences based on statistical data , 1982 .

[32]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[33]  James D. Keeler,et al.  Layered Neural Networks with Gaussian Hidden Units as Universal Approximations , 1990, Neural Computation.

[34]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[35]  Marcus Frean,et al.  The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks , 1990, Neural Computation.

[36]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[37]  M. F. Møller,et al.  Efficient Training of Feed-Forward Neural Networks , 1993 .

[38]  John C. Platt Leaning by Combining Memorization and Gradient Descent , 1990, NIPS.

[39]  Martin Fodslette Møller,et al.  Learning by Conjugate Gradients , 1990, IMYCS.

[40]  I. Omiaj,et al.  Extensions of a Theory of Networks for Approximation and Learning : dimensionality reduction and clustering , 2022 .

[41]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[42]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[44]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[45]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[46]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[47]  Yoh-Han Pao,et al.  Adaptive pattern recognition and neural networks , 1989 .

[48]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[49]  S. Grossberg,et al.  ART 2: self-organization of stable category recognition codes for analog input patterns. , 1987, Applied optics.

[50]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[51]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[52]  Charles E. Taylor Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. Complex Adaptive Systems.John H. Holland , 1994 .

[53]  Yih-Fang Huang,et al.  Bounds on the number of hidden neurons in multilayer perceptrons , 1991, IEEE Trans. Neural Networks.

[54]  Eric A. Wan,et al.  Neural network classification: a Bayesian interpretation , 1990, IEEE Trans. Neural Networks.

[55]  M. Golea,et al.  A Convergence Theorem for Sequential Learning in Two-Layer Perceptrons , 1990 .

[56]  Lawrence D. Jackel,et al.  Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[57]  L. Darrell Whitley,et al.  Genetic algorithms and neural networks: optimizing connections and connectivity , 1990, Parallel Comput..

[58]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[59]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[60]  Robert M. Farber,et al.  How Neural Nets Work , 1987, NIPS.

[61]  Mohamad T. Musavi,et al.  On the training of radial basis function classifiers , 1992, Neural Networks.

[62]  Stephen I. Gallant,et al.  Perceptron-based learning algorithms , 1990, IEEE Trans. Neural Networks.

[63]  David H. Wolpert,et al.  Mathematics of Generalization: Proceedings: SFI-CNLS Workshop on Formal Approaches to Supervised Learning (1992: Santa Fe, N. M.) , 1995 .

[64]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[65]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[66]  M.,et al.  Statistical and Structural Approaches to Texture , 2022 .

[67]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[68]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[69]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[70]  M. Moller,et al.  Supervised learning on large redundant training sets , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[71]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[72]  Robert M. Haralick On a texture-context feature extraction algorithm for remotely sensed imagery , 1971, CDC 1971.

[73]  Brian D. Ripley,et al.  Neural Networks and Related Methods for Classification , 1994 .

[74]  Farid U. Dowla,et al.  Backpropagation Learning for Multilayer Feed-Forward Neural Networks Using the Conjugate Gradient Method , 1991, Int. J. Neural Syst..

[75]  D. B. Fogel,et al.  AN INFORMATION CRITERION FOR OPTIMAL NEURAL NETWORK SELECTION , 1990, 1990 Conference Record Twenty-Fourth Asilomar Conference on Signals, Systems and Computers, 1990..

[76]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[77]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[78]  Richard T. Scalettar,et al.  Emergence of grandmother memory in feed forward networks: learning with noise and forgetfulness , 1988 .

[79]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[80]  A. Girard A fast ‘Monte-Carlo cross-validation’ procedure for large least squares problems with noisy data , 1989 .

[81]  Waibel A novel objective function for improved phoneme recognition using time delay neural networks , 1989 .

[82]  Sakir Kocabas,et al.  A review of learning , 1991, The Knowledge Engineering Review.

[83]  Naftali Tishby,et al.  Consistent inference of probabilities in layered networks: predictions and generalizations , 1989, International 1989 Joint Conference on Neural Networks.

[84]  M. F. Møller,et al.  Exact Calculation of the Product of the Hessian Matrix of Feed-Forward Network Error Functions and a Vector in 0(N) Time , 1993 .

[85]  Richard Lippmann,et al.  Neural Net and Traditional Classifiers , 1987, NIPS.

[86]  Peter Korning,et al.  Training of Neural Networks by means of Genetic Algorithms Working on very long Chromosomes , 1994 .

[87]  E. Capaldi,et al.  The organization of behavior. , 1992, Journal of applied behavior analysis.

[88]  Wolfram Schiffmann,et al.  Synthesis and Performance Analysis of Multilayer Neural Network Architectures , 1992 .

[89]  Jenq-Neng Hwang,et al.  The cascade-correlation learning: a projection pursuit learning perspective , 1996, IEEE Trans. Neural Networks.

[90]  John Moody,et al.  Prediction Risk and Architecture Selection for Neural Networks , 1994 .

[91]  Jon Sporring,et al.  Statistical Aspects of Generalization in Neural Networks , 1995 .

[92]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[93]  Tariq Samad,et al.  Designing Application-Specific Neural Networks Using the Genetic Algorithm , 1989, NIPS.

[94]  Hervé Bourlard,et al.  Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.