Neural Networks and Related Methods for Classification

Feed-forward neural networks are now widely used in classification problems, whereas nonlinear methods of discrimination developed in the statistical field are much less widely known. A general framework for classification is set up within which methods from statistics, neural networks, pattern recognition and machine learning can be compared. Neural networks emerge as one of a class of flexible non-linear regression methods which can be used to classify via regression. Many interesting issues remain, including parameter estimation, the assessment of the classifiers and in algorithm development.

[1]  M. Wand,et al.  On nonparametric discrimination using density differences , 1988 .

[2]  Philip A. Chou,et al.  Optimal pruning with applications to tree-structured source coding and modeling , 1989, IEEE Trans. Inf. Theory.

[3]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[4]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[5]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[6]  H. White Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network Models , 1989 .

[7]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[8]  Jenq-Neng Hwang,et al.  A Comparison of Projection Pursuit and Neural Network Regression Modeling , 1991, NIPS.

[9]  J. Copas Binary Regression Models for Contaminated Data , 1988 .

[10]  L. Jones On a conjecture of Huber concerning the convergence of projection pursuit regression , 1987 .

[11]  R. Fletcher Practical Methods of Optimization , 1988 .

[12]  J. Friedman Multivariate adaptive regression splines , 1990 .

[13]  W. Loh,et al.  Tree-Structured Classification via Generalized Discriminant Analysis. , 1988 .

[14]  Saul B. Gelfand,et al.  Classification trees with neural network feature extraction , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[16]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[17]  R. Carroll,et al.  On Robustness in the Logistic Regression Model , 1993 .

[18]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[19]  Esther Levin,et al.  Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[20]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[21]  Chris Bishop,et al.  Exact Calculation of the Hessian Matrix for the Multilayer Perceptron , 1992, Neural Computation.

[22]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[23]  J. Ross Quinlan,et al.  Decision trees and decision-making , 1990, IEEE Trans. Syst. Man Cybern..

[24]  Jenq-Neng Hwang,et al.  Regression modeling in back-propagation and projection pursuit learning , 1994, IEEE Trans. Neural Networks.

[25]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[26]  Christopher G. Atkeson,et al.  Some Approximation Properties of Projection Pursuit Learning Networks , 1991, NIPS.

[27]  Hans G. C. Tråvén,et al.  A neural network approach to statistical pattern classification by 'semiparametric' estimation of probability density functions , 1991, IEEE Trans. Neural Networks.

[28]  Christopher M. Bishop,et al.  A Fast Procedure for Retraining the Multilayer Perceptron , 1991, Int. J. Neural Syst..

[29]  Mohamad T. Musavi,et al.  On the training of radial basis function classifiers , 1992, Neural Networks.

[30]  Christopher M. Bishop,et al.  Curvature-driven smoothing: a learning algorithm for feedforward networks , 1993, IEEE Trans. Neural Networks.

[31]  M. A. Styblinski,et al.  Experiments in nonconvex optimization: Stochastic approximation with function smoothing and simulated annealing , 1990, Neural Networks.

[32]  D. Pregibon Resistant fits for some commonly used logistic models with medical application. , 1982, Biometrics.

[33]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[34]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[35]  Eric B. Baum,et al.  The Perceptron Algorithm is Fast for Nonmalicious Distributions , 1990, Neural Computation.

[36]  K A Spackman Maximum likelihood training of connectionist models: comparison with least squares back-propagation and logistic regression. , 1991, Proceedings. Symposium on Computer Applications in Medical Care.

[37]  Wray L. Buntine,et al.  Learning classification trees , 1992 .

[38]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[39]  Donald F. Specht,et al.  Probabilistic neural networks and the polynomial Adaline as complementary techniques for classification , 1990, IEEE Trans. Neural Networks.

[40]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[41]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[42]  I. Johnstone,et al.  Projection-Based Approximation and a Duality with Kernel Methods , 1989 .

[43]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[44]  Daryl Pregibon,et al.  Tree-based models , 1992 .

[45]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[46]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[47]  Yaser S. Abu-Mostafa,et al.  The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning , 1989, Neural Computation.

[48]  Jenq-Neng Hwang,et al.  Projection pursuit learning networks for regression , 1990, [1990] Proceedings of the 2nd International IEEE Conference on Tools for Artificial Intelligence.

[49]  S. Mitter,et al.  Recursive stochastic algorithms for global optimization in R d , 1991 .

[50]  Jun Bao,et al.  On the Design of a Tree Classifier and its Applicaton to speech Recognition , 1991, Int. J. Pattern Recognit. Artif. Intell..

[51]  Vera Kurková,et al.  Kolmogorov's theorem and multilayer neural networks , 1992, Neural Networks.

[52]  Stephen F. Gull,et al.  Developments in Maximum Entropy Data Analysis , 1989 .

[53]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[54]  Wray L. Buntine,et al.  Computing second derivatives in feed-forward networks: a review , 1994, IEEE Trans. Neural Networks.

[55]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[56]  Stuart L. Crawford Extensions to the CART Algorithm , 1989, Int. J. Man Mach. Stud..

[57]  Shun-ichi Amari,et al.  Statistical Theory of Learning Curves under Entropic Loss Criterion , 1993, Neural Computation.

[58]  Chris Bishop,et al.  Improving the Generalization Properties of Radial Basis Function Neural Networks , 1991, Neural Computation.

[59]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[60]  K. Roeder Density estimation with confidence sets exemplified by superclusters and voids in the galaxies , 1990 .

[61]  Philip A. Chou,et al.  Optimal Partitioning for Classification and Regression Trees , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[62]  Arjen van Ooyen,et al.  Improving the convergence of the back-propagation algorithm , 1992, Neural Networks.

[63]  W. Härdle Applied Nonparametric Regression , 1992 .

[64]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[65]  Emmanuel Lesaffre,et al.  Partial Separation in Logistic Discrimination , 1989 .

[66]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[67]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[68]  N. Campbell,et al.  A multivariate study of variation in two species of rock crab of the genus Leptograpsus , 1974 .

[69]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[70]  J. A. Anderson,et al.  7 Logistic discrimination , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[71]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[72]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[73]  P. Diaconis,et al.  On Nonlinear Functions of Linear Combinations , 1984 .

[74]  D. Mackay,et al.  A Practical Bayesian Framework for Backprop Networks , 1991 .

[75]  David F. Shanno,et al.  Recent advances in numerical techniques for large scale optimization , 1990 .

[76]  Richard A. Lewis,et al.  Drug design by machine learning: the use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[77]  Philip E. Gill,et al.  Practical optimization , 1981 .

[78]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[79]  Gerald Tesauro,et al.  How Tight Are the Vapnik-Chervonenkis Bounds? , 1992, Neural Computation.

[80]  Ishwar K. Sethi,et al.  Decision tree performance enhancement using an artificial neural network implementation1 1This work was supported in part by NSF grant IRI-9002087 , 1991 .

[81]  Antonio Ciampi,et al.  Recursive Partition: A Versatile Method for Exploratory-Data Analysis in Biostatistics , 1987 .

[82]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[83]  R. Tibshirani,et al.  The II P method for estimating multivariate functions from noisy data , 1991 .

[84]  Stephen I. Gallant,et al.  Neural network learning and expert systems , 1993 .

[85]  Edward J. Delp,et al.  An iterative growing and pruning algorithm for classification tree design , 1989, Conference Proceedings., IEEE International Conference on Systems, Man and Cybernetics.

[86]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[87]  Wilfrid S. Kendall,et al.  Networks and Chaos - Statistical and Probabilistic Aspects , 1993 .

[88]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[89]  Richard P. Brent,et al.  Fast training algorithms for multilayer neural nets , 1991, IEEE Trans. Neural Networks.

[90]  David J. Hand,et al.  Kernel Discriminant Analysis , 1983 .

[91]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[92]  H. Kushner Asymptotic global behavior for stochastic approximation and diffusions with slowly decreasing noise effects: Global minimization via Monte Carlo , 1987 .

[93]  Eduardo D. Sontag,et al.  Finiteness results for sigmoidal “neural” networks , 1993, STOC.

[94]  Brian D. Ripley,et al.  Statistical aspects of neural networks , 1993 .

[95]  J. Friedman,et al.  FLEXIBLE PARSIMONIOUS SMOOTHING AND ADDITIVE MODELING , 1989 .

[96]  Wayne Ieee,et al.  Entropy Nets: From Decision Trees to Neural Networks , 1990 .

[97]  Radford M. Neal Bayesian training of backpropagation networks by the hybrid Monte-Carlo method , 1992 .

[98]  L. Jones Constructive approximations for neural networks by sigmoidal functions , 1990, Proc. IEEE.

[99]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .