Selecting optimal experiments for feedforward multilayer perceptrons

Where should a researcher conduct experiments to provide training data for a multilayer perceptron? This question is investigated and a statistically-based method for optimally selecting experimental design points for multilayer perceptrons is introduced. Specifically, a criterion is developed based on the size of an estimated confidence ellipsoid for the weights in the multilayer perceptron. This criterion is minimized over a set of exemplars to find optimal design points. Until now, only graphical and heuristic algorithms were available. Initially, single output networks are examined in which the multilayer perceptron is viewed as a univariate nonlinear model. An example is used to demonstrate the superiority of optimally selected design points over randomly chosen points and points chosen in a grid pattern. Also, two measures are successfully used to rank the design points in terms of their importance. Due to the dense interconnectivity of multilayer perceptrons, locating design points can be computationally complex. Therefore, two methods are presented as avenues to significantly reduce complexity--a distributed linear feedthrough network structure and a weight subset method. Next, multiple output networks are examined with the multilayer perceptron viewed as a multivariate nonlinear model. The criterion for selecting design points in this framework becomes more complex and a simplifying technique is employed to judiciously choose desired outputs of the network to produce uncorrelated actual outputs. Finally, the methods described above are integrated into a comprehensive procedure and are tested on two applications dealing with aircraft survivability. The single output methodology is demonstrated on the classification of the performance of armor piercing incendiary projectiles striking composite materials and the multiple output methodology is applied to a seven-class problem relating time and stress to the performance of the projectiles. In both cases, simulating the indicated experiments produced a superior multilayer perceptron.

[1]  R.P. Lippmann,et al.  Pattern classification using neural networks , 1989, IEEE Communications Magazine.

[2]  M. E. Johnson,et al.  Some Guidelines for Constructing Exact D-Optimal Designs on Convex Design Spaces , 1983 .

[3]  Dean A. Pomerleau,et al.  What's hidden in the hidden layers? , 1989 .

[4]  Wright-Patterson Afb,et al.  Feature Selection Using a Multilayer Perceptron , 1990 .

[5]  W. G. Hunter,et al.  Design of experiments for parameter estimation in multiresponse situations , 1966 .

[6]  Peter D. H. Hill D-Optimal Designs for Partially Nonlinear Regression Models , 1980 .

[7]  M. J. Box An experimental design criterion for precise estimation of a subset of the parameters in a nonlinear model , 1971 .

[8]  Marvin Minsky,et al.  Perceptrons: expanded edition , 1988 .

[9]  A. Gallant,et al.  Nonlinear Statistical Models , 1988 .

[10]  George E. P. Box,et al.  SEQUENTIAL DESIGN OF EXPERIMENTS FOR NONLINEAR MODELS. , 1963 .

[11]  William H. Press,et al.  Numerical recipes , 1990 .

[12]  M. Goldstein,et al.  Multivariate Analysis: Methods and Applications , 1984 .

[13]  Christopher J. Nachtsheim,et al.  Tools for Computer-Aided Design of Experiments , 1987 .

[14]  Yih-Fang Huang,et al.  Bounds on the number of hidden neurons in multilayer perceptrons , 1991, IEEE Trans. Neural Networks.

[15]  O. Dykstra The Augmentation of Experimental Data to Maximize [X′X] , 1971 .

[16]  G. L. Tarr,et al.  Multi-layered feedforward neural networks for image segmentation , 1992 .

[17]  Yoshio Hirose,et al.  Backpropagation algorithm which varies the number of hidden units , 1989, International 1989 Joint Conference on Neural Networks.

[18]  Anthony C. Atkinson,et al.  The Design of Experiments for Parameter Estimation , 1968 .

[19]  Gregory L. Reinhart A Fortran Based Learning System Using Multilayer Back-Propagation Neural Network Techniques , 1994 .

[20]  D. Haesloop,et al.  Neural networks for process identification , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[21]  H. L. Lucas,et al.  DESIGN OF EXPERIMENTS IN NON-LINEAR SITUATIONS , 1959 .

[22]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[23]  Jean M Steppe Feature and Model Selection in Feedforward Neural Networks , 1994 .

[24]  Garret N. Vanderplaats,et al.  Numerical Optimization Techniques for Engineering Design: With Applications , 1984 .

[25]  B. R. Holt,et al.  Regression analysis of spectroscopic process data using a combined architecture of linear and nonlinear artificial neural networks , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[26]  R. Scheaffer,et al.  Mathematical Statistics with Applications. , 1992 .

[27]  Steven K. Rogers,et al.  An Introduction to Biological and Artificial Neural Networks for Pattern Recognition , 1991 .

[28]  J. Neter,et al.  Applied Linear Regression Models , 1983 .

[29]  William J. Hill,et al.  Discrimination Among Mechanistic Models , 1967 .

[30]  Sidney Addelman,et al.  trans-Dimethanolbis(1,1,1-trifluoro-5,5-dimethylhexane-2,4-dionato)zinc(II) , 2008, Acta crystallographica. Section E, Structure reports online.

[31]  Kenneth W. Bauer,et al.  Determining input features for multilayer perceptrons , 1995, Neurocomputing.

[32]  Arjun K. Gupta The foundations of multivariate analysis , 1982 .

[33]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[34]  Jenq-Neng Hwang,et al.  Query-based learning applied to partially trained multilayer perceptrons , 1991, IEEE Trans. Neural Networks.

[35]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[36]  M. J. Box,et al.  Estimation and Design Criteria for Multiresponse Non‐Linear Models with Non‐Homogeneous Variance , 1972 .

[37]  R. C. St D-Optimality for Regression Designs: A Review , 1975 .

[38]  W. J. Hill,et al.  Design of Experiments for Subsets of Parameters , 1974 .

[39]  A. Roli Artificial Neural Networks , 2012, Lecture Notes in Computer Science.

[40]  David A. Cohn,et al.  Training Connectionist Networks with Queries and Selective Sampling , 1989, NIPS.

[41]  Michael Jackson,et al.  Optimal Design of Experiments , 1994 .

[42]  A. Ravindran,et al.  Engineering Optimization: Methods and Applications , 2006 .

[43]  R. H. Myers Classical and modern regression with applications , 1986 .

[44]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[45]  Eric B. Baum,et al.  Neural net algorithms that learn in polynomial time from examples and queries , 1991, IEEE Trans. Neural Networks.

[46]  Donald H. Foley Considerations of sample and feature size , 1972, IEEE Trans. Inf. Theory.

[47]  J. Kiefer,et al.  The Equivalence of Two Extremum Problems , 1960, Canadian Journal of Mathematics.

[48]  S. Y. Kung,et al.  An algebraic projection analysis for optimal hidden units size and learning rates in back-propagation learning , 1988, IEEE 1988 International Conference on Neural Networks.

[49]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[50]  B. Sankur,et al.  Applications of Walsh and related functions , 1986 .

[51]  Derek J. Pike,et al.  Empirical Model‐building and Response Surfaces. , 1988 .