论文信息 - Evolution and generalization of a single neurone: I. Single-layer perceptron as seven statistical classifiers

Evolution and generalization of a single neurone: I. Single-layer perceptron as seven statistical classifiers

Unlike many other investigations on this topic, the present one considers the non-linear single-layer perceptron (SLP) as a process in which the weights of the perceptron are increasing, and the cost function of the sum of squares is changing gradually. During the backpropagation training, the decision boundary of of SLP becomes identical or close to that of seven statistical classifiers: (1) the Euclidean distance classifier, (2) the regularized linear discriminant analysis, (3) the standard Fisher linear discriminant function, (4) the Fisher linear discriminant function with a pseudoinverse covariance matrix, (5) the generalized Fisher discriminant function, (6) the minimum empirical error classifier, and (7) the maximum margin classifier. In order to obtain a wider range of classifiers, five new complexity-control techniques are proposed: target value control, moving of the learning data centre into the origin of coordinates, zero weight initialization, use of an additional negative weight decay term called "anti-regularization", and use of an exponentially increasing learning step. Which particular type of classifier will be obtained depends on the data, the cost function to be minimized, the optimization technique and its parameters, and the stopping criteria.

Sarunas Raudys | S. Raudys

[1] Wah-Chun Chan,et al. An optimal algorithm for pattern classification , 1971 .

[2] R. Duin. Small sample size generalization , 1995 .

[3] R. E. Warmack,et al. An Algorithm for the Optimal Solution of Linear Inequalities and its Application to Pattern Recognition , 1973, IEEE Transactions on Computers.

[4] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[5] J. S. Koford,et al. The use of an adaptive threshold element to design a linear optimal pattern classifier , 1966, IEEE Trans. Inf. Theory.

[6] A. C. Wolff. The estimation of the optimum linear decision function with a sequential random method , 1966, IEEE Trans. Inf. Theory.

[7] J. Friedman. Regularized Discriminant Analysis , 1989 .

[8] Keinosuke Fukunaga,et al. Introduction to Statistical Pattern Recognition , 1972 .

[9] A. E. Hoerl,et al. Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[10] Bernard Widrow,et al. Adaptive switching circuits , 1988 .

[11] You-yen. Yang. Classification into two multivariate normal distributions with different covariance matrices , 1965 .