Direct Parallel Perceptrons (DPPs): Fast Analytical Calculation of the Parallel Perceptrons Weights With Margin Control for Classification Tasks

Parallel perceptrons (PPs) are very simple and efficient committee machines (a single layer of perceptrons with threshold activation functions and binary outputs, and a majority voting decision scheme), which nevertheless behave as universal approximators. The parallel delta (P-Delta) rule is an effective training algorithm, which, following the ideas of statistical learning theory used by the support vector machine (SVM), raises its generalization ability by maximizing the difference between the perceptron activations for the training patterns and the activation threshold (which corresponds to the separating hyperplane). In this paper, we propose an analytical closed-form expression to calculate the PPs' weights for classification tasks. Our method, called Direct Parallel Perceptrons (DPPs), directly calculates (without iterations) the weights using the training patterns and their desired outputs, without any search or numeric function optimization. The calculated weights globally minimize an error function which simultaneously takes into account the training error and the classification margin. Given its analytical and noniterative nature, DPPs are computationally much more efficient than other related approaches (P-Delta and SVM), and its computational complexity is linear in the input dimensionality. Therefore, DPPs are very appealing, in terms of time complexity and memory consumption, and are very easy to use for high-dimensional classification tasks. On real benchmark datasets with two and multiple classes, DPPs are competitive with SVM and other approaches but they also allow online learning and, as opposed to most of them, have no tunable parameters.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  Senén Barro,et al.  Fast weight calculation for kernel-based perceptron in two-class classification problems , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[3]  Wolfgang Maass,et al.  On the Computational Power of Winner-Take-All , 2000, Neural Computation.

[4]  José R. Dorronsoro,et al.  Balanced Boosting with Parallel Perceptrons , 2005, IWANN.

[5]  Li Hao,et al.  A Mixed Parallel Perceptron Classifier and Several Application problems , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[6]  Ian Witten,et al.  Data Mining , 2000 .

[7]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[8]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[9]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[10]  José R. Dorronsoro,et al.  Boosting Parallel Perceptrons for Label Noise Reduction in Classification Problems , 2005, IWINAC.

[11]  Yujian Li,et al.  Multiconlitron: A General Piecewise Linear Classifier , 2011, IEEE Transactions on Neural Networks.

[12]  Thomas Martinetz,et al.  SoftDoubleMaxMinOver: Perceptron-Like Training of Support Vector Machines , 2009, IEEE Transactions on Neural Networks.

[13]  Victor S. Cheng,et al.  Classification of Brain Glioma by Using SVMs Bagging with Feature Selection , 2006, BioDM.

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[16]  Bernard Widrow,et al.  MADALINE RULE II: a training algorithm for neural networks , 1988, ICNN.

[17]  Constantinos Panagiotakopoulos,et al.  The Margitron: A Generalized Perceptron With Margin , 2011, IEEE Transactions on Neural Networks.

[18]  Yen-Jen Oyang,et al.  Data classification with radial basis function networks based on a novel kernel density estimation algorithm , 2005, IEEE Transactions on Neural Networks.

[19]  Peter Auer,et al.  Reducing Communication for Distributed Learning in Neural Networks , 2002, ICANN.

[20]  Benyong Liu Kernel-based nonlinear discriminator with closed-form solution , 2003, International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003.

[21]  Mineichi Kudo,et al.  Piecewise linear classifiers with an appropriate number of hyperplanes , 1998, Pattern Recognit..

[22]  José R. Dorronsoro,et al.  Parallel Perceptrons, Activation Margins and Imbalanced Training Set Pruning , 2005, IbPRIA.

[23]  Kar-Ann Toh An error-counting network for pattern classification , 2008, Neurocomputing.

[24]  David Casasent,et al.  A closed-form neural network for discriminatory feature extraction from high-dimensional data , 2001, Neural Networks.

[25]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[26]  Benyong Liu Adaptive training of a kernel-based nonlinear discriminator , 2005, Pattern Recognit..

[27]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[28]  Maliha S. Nash,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 2001, Technometrics.

[29]  Samy Bengio,et al.  Torch: a modular machine learning software library , 2002 .

[30]  Augusto Montisci,et al.  Geometrical synthesis of MLP neural networks , 2008, Neurocomputing.

[31]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[32]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[33]  Peter Auer,et al.  A learning rule for very simple universal approximators consisting of a single layer of perceptrons , 2008, Neural Networks.

[34]  José R. Dorronsoro,et al.  Discriminant Parallel Perceptrons , 2005, ICANN.