Direct Kernel Perceptron (DKP): Ultra-fast kernel ELM-based classification with non-iterative closed-form weight calculation

The Direct Kernel Perceptron (DKP) (Fernández-Delgado et al., 2010) is a very simple and fast kernel-based classifier, related to the Support Vector Machine (SVM) and to the Extreme Learning Machine (ELM) (Huang, Wang, & Lan, 2011), whose α-coefficients are calculated directly, without any iterative training, using an analytical closed-form expression which involves only the training patterns. The DKP, which is inspired by the Direct Parallel Perceptron, (Auer et al., 2008), uses a Gaussian kernel and a linear classifier (perceptron). The weight vector of this classifier in the feature space minimizes an error measure which combines the training error and the hyperplane margin, without any tunable regularization parameter. This weight vector can be translated, using a variable change, to the α-coefficients, and both are determined without iterative calculations. We calculate solutions using several error functions, achieving the best trade-off between accuracy and efficiency with the linear function. These solutions for the α coefficients can be considered alternatives to the ELM with a new physical meaning in terms of error and margin: in fact, the linear and quadratic DKP are special cases of the two-class ELM when the regularization parameter C takes the values C=0 and C=∞. The linear DKP is extremely efficient and much faster (over a vast collection of 42 benchmark and real-life data sets) than 12 very popular and accurate classifiers including SVM, Multi-Layer Perceptron, Adaboost, Random Forest and Bagging of RPART decision trees, Linear Discriminant Analysis, K-Nearest Neighbors, ELM, Probabilistic Neural Networks, Radial Basis Function neural networks and Generalized ART. Besides, despite its simplicity and extreme efficiency, DKP achieves higher accuracies than 7 out of 12 classifiers, exhibiting small differences with respect to the best ones (SVM, ELM, Adaboost and Random Forest), which are much slower. Thus, the DKP provides an easy and fast way to achieve classification accuracies which are not too far from the best one for a given problem. The C and Matlab code of DKP are freely available.

[1]  Senén Barro,et al.  Direct Parallel Perceptrons (DPPs): Fast Analytical Calculation of the Parallel Perceptrons Weights With Margin Control for Classification Tasks , 2011, IEEE Transactions on Neural Networks.

[2]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[3]  Dianhui Wang,et al.  Extreme learning machines: a survey , 2011, Int. J. Mach. Learn. Cybern..

[4]  Yifei Wang,et al.  Geometric Algorithms to Large Margin Classifier Based on Affine Hulls , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[6]  Peter Auer,et al.  A learning rule for very simple universal approximators consisting of a single layer of perceptrons , 2008, Neural Networks.

[7]  Manuel Fernández Delgado,et al.  Exhaustive comparison of colour texture features and classification methods to discriminate cells categories in histological images of fish ovary , 2013, Pattern Recognit..

[8]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[9]  Richard D. Braatz,et al.  Fisher Discriminant Analysis , 2000 .

[10]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[12]  Thomas Martinetz,et al.  SoftDoubleMaxMinOver: Perceptron-Like Training of Support Vector Machines , 2009, IEEE Transactions on Neural Networks.

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  Arno Formella,et al.  Common Scab Detection on Potatoes Using an Infrared Hyperspectral Imaging System , 2011, ICIAP.

[15]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[16]  Kar-Ann Toh An error-counting network for pattern classification , 2008, Neurocomputing.

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Senén Barro,et al.  Fast weight calculation for kernel-based perceptron in two-class classification problems , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[19]  Qinyu. Zhu Extreme Learning Machine , 2013 .

[20]  David Casasent,et al.  A closed-form neural network for discriminatory feature extraction from high-dimensional data , 2001, Neural Networks.

[21]  Lei Chen,et al.  Enhanced random search based incremental extreme learning machine , 2008, Neurocomputing.

[22]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[23]  Constantinos Panagiotakopoulos,et al.  The Margitron: A Generalized Perceptron With Margin , 2011, IEEE Transactions on Neural Networks.

[24]  Pilar Carrión,et al.  Classification of honeybee pollen using a multiscale texture filtering scheme , 2004, Machine Vision and Applications.

[25]  Benyong Liu Kernel-based nonlinear discriminator with closed-form solution , 2003, International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003.

[26]  Wang Xi-zhao,et al.  Architecture selection for networks trained with extreme learning machine using localized generalization error model , 2013 .

[27]  Hongming Zhou,et al.  Optimization method based extreme learning machine for classification , 2010, Neurocomputing.

[28]  Chee Peng Lim,et al.  A Hybrid Art-grnn Online Learning Neural Network with a -insensitive Loss Function , 2022 .

[29]  Xizhao Wang,et al.  Dynamic ensemble extreme learning machine based on sample entropy , 2012, Soft Comput..

[30]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[31]  Amaury Lendasse,et al.  TROP-ELM: A double-regularized ELM using LARS and Tikhonov regularization , 2011, Neurocomputing.

[32]  Augusto Montisci,et al.  Geometrical synthesis of MLP neural networks , 2008, Neurocomputing.

[33]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[34]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[35]  Roberto Iglesias Rodríguez,et al.  Comparison of several chemometric techniques for the classification of orujo distillate alcoholic samples from Galicia (northwest Spain) according to their certified brand of origin. , 2010 .

[36]  E. Cernadas,et al.  Automatic detection and classification of grains of pollen based on shape and texture , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[37]  Li Yujian,et al.  Multiconlitron: a general piecewise linear classifier. , 2011, IEEE transactions on neural networks.