Neural networks for pattern recognition

From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

[1]  H. W. Raudenbush On Hilbert's thirteenth Paris problem , 1927 .

[2]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[3]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[4]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[5]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[6]  J. Blum Multidimensional Stochastic Approximation Methods , 1954 .

[7]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[8]  J. Orbach Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. , 1962 .

[9]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[10]  H. D. Block The perceptron: a model for brain functioning. I , 1962 .

[11]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[12]  E. Nadaraya On Estimating Regression , 1964 .

[13]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[14]  D. Sprecher On the structure of continuous functions of several variables , 1965 .

[15]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[16]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[17]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[18]  A. M. Walker On the Asymptotic Behaviour of Posterior Distributions , 1969 .

[19]  H. Akaike Fitting autoregressive models for prediction , 1969 .

[20]  N. E. Day Estimating the components of a mixture of normal distributions , 1969 .

[21]  Robert O. Winder,et al.  Threshold logic , 1971, IEEE Spectrum.

[22]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[23]  A. G. Ivakhnenko,et al.  Polynomial Theory of Complex Systems , 1971, IEEE Trans. Syst. Man Cybern..

[24]  J. M. Watt Numerical Initial Value Problems in Ordinary Differential Equations , 1972 .

[25]  C. R. Rao,et al.  Generalized Inverse of Matrices and its Applications , 1972 .

[26]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[27]  Kanti V. Mardia,et al.  Statistics of Directional Data , 1972 .

[28]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[29]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[30]  D. Anderson,et al.  Algorithms for minimization without derivatives , 1974 .

[31]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Computing k-Nearest Neighbors , 1975, IEEE Transactions on Computers.

[32]  G. Wahba,et al.  A completely automatic french curve: fitting spline functions by cross validation , 1975 .

[33]  J. Cooley,et al.  The Fast Fourier Transform , 1975 .

[34]  T. Gerig Multivariate Analysis: Techniques for Educational and Psychological Research , 1975 .

[35]  J. Kahane Sur le théorème de superposition de Kolmogorov , 1975 .

[36]  M. J. D. Powell,et al.  Restart procedures for the conjugate gradient method , 1977, Math. Program..

[37]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[38]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[39]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[40]  David F. Shanno,et al.  Conjugate Gradient Methods with Inexact Searches , 1978, Math. Oper. Res..

[41]  David J. Hand,et al.  Experiments on the edited condensed nearest neighbor rule , 1978, Inf. Sci..

[42]  M. Stone Cross-validation:a review 2 , 1978 .

[43]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[44]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[45]  Andrew J. Viterbi,et al.  Principles of Digital Communication and Coding , 1979 .

[46]  Thomas Kailath,et al.  Linear Systems , 1980 .

[47]  David J. Hand,et al.  Discrimination and Classification , 1982 .

[48]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[49]  Philip E. Gill,et al.  Practical optimization , 1981 .

[50]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[51]  Keinosuke Fukunaga 15 Intrinsic dimensionality extraction , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[52]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[53]  Takayuki Ito,et al.  Neocognitron: A neural network model for a mechanism of visual pattern recognition , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[54]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[55]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[56]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[57]  P. Diaconis,et al.  On Nonlinear Functions of Linear Combinations , 1984 .

[58]  James O. Berger,et al.  Statistical Decision Theory and Bayesian Analysis, Second Edition , 1985 .

[59]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[60]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[61]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[62]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[63]  E. T. Jaynes,et al.  BAYESIAN METHODS: GENERAL BACKGROUND ? An Introductory Tutorial , 1986 .

[64]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[65]  L. Jones On a conjecture of Huber concerning the convergence of projection pursuit regression , 1987 .

[66]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[67]  Robert M. Farber,et al.  How Neural Nets Work , 1987, NIPS.

[68]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[69]  Colin Giles,et al.  Learning, invariance, and generalization in high-order neural networks. , 1987, Applied optics.

[70]  J J Hopfield,et al.  Learning algorithms and probability distributions in feed-forward and feed-back networks. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[71]  Eric B. Baum,et al.  Supervised Learning of Probability Distributions by Neural Networks , 1987, NIPS.

[72]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[73]  Richard Lippmann,et al.  Neural Net and Traditional Classifiers , 1987, NIPS.

[74]  Michael A. Arbib,et al.  Brains, machines and mathematics (2. ed.) , 1987 .

[75]  Stephen M. Omohundro,et al.  Efficient Algorithms with Neural Network Behavior , 1987, Complex Syst..

[76]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[77]  Eric B. Baum,et al.  On the capabilities of multilayer perceptrons , 1988, J. Complex..

[78]  Esther Levin,et al.  Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[79]  Yves Chauvin,et al.  A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[80]  Kunihiko Fukushima,et al.  Neocognitron: A hierarchical neural network capable of visual pattern recognition , 1988, Neural Networks.

[81]  S. Gull Bayesian Inductive Inference and Maximum Entropy , 1988 .

[82]  S. Ragazzini,et al.  Learning of word stress in a sub-optimal second order back-propagation neural network , 1988, IEEE 1988 International Conference on Neural Networks.

[83]  D. R. Hush,et al.  Improving the learning rate of back-propagation with the gradient reuse algorithm , 1988, IEEE 1988 International Conference on Neural Networks.

[84]  David Lowe,et al.  A Hybrid Optimisation Strategy for Adaptive Feed-Forward Layered Networks , 1988 .

[85]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[86]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[87]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[88]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[89]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[90]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[91]  Roberto Battiti,et al.  Accelerated Backpropagation Learning: Two Optimization Methods , 1989, Complex Syst..

[92]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[93]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[94]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[95]  Jean-Pierre Nadal,et al.  Study of a Growth Algorithm for a Feedforward Network , 1989, Int. J. Neural Syst..

[96]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[97]  A. Owens,et al.  Efficient training of the backpropagation network by solving a system of stiff ordinary differential equations , 1989, International 1989 Joint Conference on Neural Networks.

[98]  Tomaso A. Poggio,et al.  Representation Properties of Networks: Kolmogorov's Theorem Is Irrelevant , 1989, Neural Computation.

[99]  Hervé Bourlard,et al.  A Continuous Speech Recognition System Embedding MLP into HMM , 1989, NIPS.

[100]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[101]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[102]  Robert J. Schalkoff,et al.  Digital Image Processing and Computer Vision , 1989 .

[103]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[104]  Yaser S. Abu-Mostafa,et al.  The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning , 1989, Neural Computation.

[105]  Stephen F. Gull,et al.  Developments in Maximum Entropy Data Analysis , 1989 .

[106]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[107]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[108]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[109]  Keinosuke Fukunaga,et al.  The Reduced Parzen Classifier , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[110]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[111]  J. Nadal,et al.  Learning in feedforward layered networks: the tiling algorithm , 1989 .

[112]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[113]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[114]  L. Spirkovska,et al.  Rapid training of higher-order neural networks for invariant pattern recognition , 1989, International 1989 Joint Conference on Neural Networks.

[115]  Geoffrey E. Hinton,et al.  Dimensionality Reduction and Prior Knowledge in E-Set Recognition , 1989, NIPS.

[116]  Sheng Chen,et al.  Orthogonal least squares methods and their application to non-linear system identification , 1989 .

[117]  Stephen F. Gull,et al.  Bayesian Data Analysis: Straight-line fitting , 1989 .

[118]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[119]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[120]  R. Hecht-Nielsen,et al.  Theory of the Back Propagation Neural Network , 1989 .

[121]  H. White,et al.  Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions , 1989, International 1989 Joint Conference on Neural Networks.

[122]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[123]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[124]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[125]  M. Golea,et al.  A Convergence Theorem for Sequential Learning in Two-Layer Perceptrons , 1990 .

[126]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[127]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[128]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[129]  G. J. Gibson,et al.  On the decision regions of multilayer perceptrons , 1990, Proc. IEEE.

[130]  Neil E. Cotter,et al.  The Stone-Weierstrass theorem and its application to neural networks , 1990, IEEE Trans. Neural Networks.

[131]  R. T. Cox Probability, frequency and reasonable expectation , 1990 .

[132]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[133]  Halbert White,et al.  Connectionist nonparametric regression: Multilayer feedforward networks can learn arbitrary mappings , 1990, Neural Networks.

[134]  Kohji Fukunaga,et al.  Introduction to Statistical Pattern Recognition-Second Edition , 1990 .

[135]  Marcus Frean,et al.  The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks , 1990, Neural Computation.

[136]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[137]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[138]  Alireza Khotanzad,et al.  Invariant Image Recognition by Zernike Moments , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[139]  Chuanyi Ji,et al.  Generalizing Smoothness Constraints from Discrete Samples , 1990, Neural Computation.

[140]  David Lowe,et al.  The optimised internal representation of multilayer classifier networks performs nonlinear discriminant analysis , 1990, Neural Networks.

[141]  L. Jones Constructive approximations for neural networks by sigmoidal functions , 1990, Proc. IEEE.

[142]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[143]  V. Tikhomirov On the Representation of Continuous Functions of Several Variables as Superpositions of Continuous Functions of a Smaller Number of Variables , 1991 .

[144]  Shang-Liang Chen,et al.  Orthogonal least squares learning algorithm for radial basis function networks , 1991, IEEE Trans. Neural Networks.

[145]  Pietro Burrascano,et al.  A norm selection criterion for the generalized delta rule , 1991, IEEE Trans. Neural Networks.

[146]  Vladik Kreinovich,et al.  Arbitrary nonlinearity is sufficient to represent all functions by neural networks: A theorem , 1991, Neural Networks.

[147]  Etienne Barnard,et al.  Invariance and neural nets , 1991, IEEE Trans. Neural Networks.

[148]  Marwan A. Jabri,et al.  Weight perturbation: an optimal architecture and learning technique for analog VLSI feedforward and recurrent multilayer networks , 1992, IEEE Trans. Neural Networks.

[149]  V. Kůrková Kolmogorov's Theorem Is Relevant , 1991, Neural Comput..

[150]  Hans G. C. Tråvén,et al.  A neural network approach to statistical pattern classification by 'semiparametric' estimation of probability density functions , 1991, IEEE Trans. Neural Networks.

[151]  Christopher M. Bishop,et al.  A Fast Procedure for Retraining the Multilayer Perceptron , 1991, Int. J. Neural Syst..

[152]  Yoshifusa Ito,et al.  Representation of functions by superpositions of a step or sigmoid function and their applications to neural network theory , 1991, Neural Networks.

[153]  Zhi-Quan Luo,et al.  On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks , 1991, Neural Computation.

[154]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[155]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[156]  Barak A. Pearlmutter,et al.  Equivalence Proofs for Multi-Layer Perceptron Classifiers and the Bayesian Discriminant Function , 1991 .

[157]  David G. Lowe,et al.  Optimized Feature Extraction and the Bayes Decision in Feed-Forward Classifier Networks , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[158]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[159]  Tomaso Poggio,et al.  Computational vision and regularization theory , 1985, Nature.

[160]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[161]  Chris Bishop,et al.  Improving the Generalization Properties of Radial Basis Function Neural Networks , 1991, Neural Computation.

[162]  Richard Lippmann,et al.  Improved Hidden Markov Models Speech Recognition Using Radial Basis Function Networks , 1991, NIPS.

[163]  Wray L. Buntine,et al.  Bayesian Back-Propagation , 1991, Complex Syst..

[164]  J. Skilling On Parameter Estimation and Quantified Maxent , 1991 .

[165]  Edward K. Blum,et al.  Approximation theory and feedforward networks , 1991, Neural Networks.

[166]  P. GALLINARI,et al.  On the relations between discriminant analysis and multilayer perceptrons , 1991, Neural Networks.

[167]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[168]  Jocelyn Sietsma,et al.  Creating artificial neural networks that generalize , 1991, Neural Networks.

[169]  Robert P. W. Duin,et al.  Generalization capabilities of minimal kernel-based networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[170]  Yann LeCun,et al.  Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[171]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[172]  Chris Bishop,et al.  Exact Calculation of the Hessian Matrix for the Multilayer Perceptron , 1992, Neural Computation.

[173]  Paulo J. G. Lisboa,et al.  Translation, rotation, and scale invariant pattern recognition by high-order neural networks and moment classifiers , 1992, IEEE Trans. Neural Networks.

[174]  David H. Wolpert,et al.  On the Use of Evidence in Neural Networks , 1992, NIPS.

[175]  Etienne Barnard,et al.  Optimization for training neural nets , 1992, IEEE Trans. Neural Networks.

[176]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[177]  Eduardo Sontag,et al.  For neural networks, function determines form , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.

[178]  Uwe Hartmann,et al.  Mapping neural network derived from the parzen window estimator , 1992, Neural Networks.

[179]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[180]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[181]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[182]  Vera Kurková,et al.  Kolmogorov's theorem and multilayer neural networks , 1992, Neural Networks.

[183]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[184]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[185]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[186]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[187]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[188]  Barak A. Pearlmutter,et al.  Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors , 1992, NIPS 1992.

[189]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[190]  Joydeep Ghosh,et al.  Efficient Higher-Order Neural Networks for Classification and Function Approximation , 1992, Int. J. Neural Syst..

[191]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[192]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[193]  Héctor J. Sussmann,et al.  Uniqueness of the weights for minimal feedforward nets with a given input-output map , 1992, Neural Networks.

[194]  Volker Tresp,et al.  Some Solutions to the Missing Feature Problem in Vision , 1992, NIPS.

[195]  Harris Drucker,et al.  Improving generalization performance using double backpropagation , 1992, IEEE Trans. Neural Networks.

[196]  Radford M. Neal Bayesian training of backpropagation networks by the hybrid Monte-Carlo method , 1992 .

[197]  Martin G. Bello,et al.  Enhanced training algorithms, and integrated training/architecture selection for multilayer perceptron networks , 1992, IEEE Trans. Neural Networks.

[198]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[199]  Jooyoung Park,et al.  Approximation and Radial-Basis-Function Networks , 1993, Neural Computation.

[200]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[201]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[202]  Christopher M. Bishop,et al.  Curvature-driven smoothing: a learning algorithm for feedforward networks , 1993, IEEE Trans. Neural Networks.

[203]  M. F. Møller,et al.  Efficient Training of Feed-Forward Neural Networks , 1993 .

[204]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[205]  Joab R Winkler,et al.  Numerical recipes in C: The art of scientific computing, second edition , 1993 .

[206]  Robert Hecht-Nielsen,et al.  On the Geometry of Feedforward Neural Network Error Surfaces , 1993, Neural Computation.

[207]  Jack Sklansky,et al.  On Automatic Feature Selection , 1988, Int. J. Pattern Recognit. Artif. Intell..

[208]  Yong Liu,et al.  Robust Parameter Estimation and Model Selection for Neural Network Regression , 1993, NIPS.

[209]  H. H. Thodberg Ace of Bayes : Application of Neural , 1993 .

[210]  Christopher M. Bishop,et al.  Estimating Conditional Probability Densities for Periodic Variables , 1994, NIPS.

[211]  Jenq-Neng Hwang,et al.  Regression modeling in back-propagation and projection pursuit learning , 1994, IEEE Trans. Neural Networks.

[212]  A. Weigend,et al.  Estimating the mean and variance of the target probability distribution , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[213]  Christopher M. Bishop,et al.  Novelty detection and neural network validation , 1994 .

[214]  C. Bishop Mixture density networks , 1994 .

[215]  Andrew R. Webb,et al.  Functional approximation by feed-forward networks: a least-squares approach to generalization , 1994, IEEE Trans. Neural Networks.

[216]  Wray L. Buntine,et al.  Computing second derivatives in feed-forward networks: a review , 1994, IEEE Trans. Neural Networks.

[217]  D. M. Titterington,et al.  Neural Networks: A Review from a Statistical Perspective , 1994 .

[218]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[219]  Paul C. Kainen,et al.  Functionally Equivalent Feedforward Neural Networks , 1994, Neural Computation.

[220]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994 .

[221]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[222]  Michael I. Jordan,et al.  Learning from Incomplete Data , 1994 .

[223]  Ralph Neuneier,et al.  Estimation of Conditional Densities: A Comparison of Neural Network Approaches , 1994 .

[224]  Yves Chauvin,et al.  Backpropagation: the basic theory , 1995 .

[225]  Peter M. Williams,et al.  Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[226]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[227]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[228]  D. Mackay,et al.  Bayesian neural networks and density networks , 1995 .

[229]  David J. C. MacKay,et al.  Bayesian Methods for Backpropagation Networks , 1996 .

[230]  B. AfeArd CALCULATING THE SINGULAR VALUES AND PSEUDOINVERSE OF A MATRIX , 2022 .