A Simple Derivation of a Bound on the Perceptron Margin Using Singular Value Decomposition

The perceptron is a simple supervised algorithm to train a linear classifier that has been analyzed and used extensively. The classifier separates the data into two groups using a decision hyperplane, with the margin between the data and the hyperplane determining the classifier's ability to generalize and its robustness to input noise. Exact results for the maximal size of the separating margin are known for specific input distributions, and bounds exist for arbitrary distributions, but both rely on lengthy statistical mechanics calculations carried out in the limit of infinite input size. Here we present a short analysis of perceptron classification using singular value decomposition. We provide a simple derivation of a lower bound on the margin and an explicit formula for the perceptron weights that converges to the optimal result for large separating margins.

[1]  R. Fletcher Practical Methods of Optimization , 1988 .

[2]  Xiao-Jing Wang,et al.  Internal Representation of Task Rules by Recurrent Dynamics: The Importance of the Diversity of Neural Responses , 2010, Front. Comput. Neurosci..

[3]  Estimates of optimal storage conditions in neural network memories based on random matrix theory , 1992 .

[4]  E. Gardner The space of interactions in neural network models , 1988 .

[5]  B. Forrest Content-addressability and learning in neural networks , 1988 .

[6]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[7]  I. Guyon,et al.  Information storage and retrieval in spin-glass like neural networks , 1985 .

[8]  Thomas B. Kepler,et al.  Optimal learning in neural network memories , 1989 .

[9]  Kanter,et al.  Associative recall of memory without errors. , 1987, Physical review. A, General physics.

[10]  Przemyslaw Klesk,et al.  Maximal Margin Estimation with Perceptron-Like Algorithm , 2008, ICAISC.

[11]  Thomas B. Kepler,et al.  Domains of attraction in neural networks , 1988 .

[12]  M. Mézard,et al.  Spin Glass Theory and Beyond , 1987 .

[13]  F Gerl,et al.  Storage capacity and optimal learning of Potts-model perceptrons by a cavity method* , 1994 .

[14]  E. Gardner,et al.  Optimal storage properties of neural network models , 1988 .

[15]  André van Schaik,et al.  Learning the pseudoinverse solution to network weights , 2012, Neural Networks.

[16]  J. Nadal,et al.  Optimal Information Storage and the Distribution of Synaptic Weights Perceptron versus Purkinje Cell , 2004, Neuron.

[17]  Thomas B. Kepler,et al.  Universality in the space of interactions for network models , 1989 .

[18]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[19]  Walter Senn,et al.  Learning Real-World Stimuli in a Neural Network with Spike-Driven Synaptic Dynamics , 2007, Neural Computation.

[20]  W. Krauth,et al.  Learning algorithms with optimal stability in neural networks , 1987 .

[21]  Daniel J. Amit,et al.  Spike-Driven Synaptic Dynamics Generating Working Memory States , 2003, Neural Computation.

[22]  M. Lewenstein,et al.  Optimal storage of invariant sets of patterns in neural network memories , 1991 .

[23]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .