Kernel-Matching Pursuits With Arbitrary Loss Functions

The purpose of this research is to develop a classifier capable of state-of-the-art performance in both computational efficiency and generalization ability while allowing the algorithm designer to choose arbitrary loss functions as appropriate for a give problem domain. This is critical in applications involving heavily imbalanced, noisy, or non-Gaussian distributed data. To achieve this goal, a kernel-matching pursuit (KMP) framework is formulated where the objective is margin maximization rather than the standard error minimization. This approach enables excellent performance and computational savings in the presence of large, imbalanced training data sets and facilitates the development of two general algorithms. These algorithms support the use of arbitrary loss functions allowing the algorithm designer to control the degree to which outliers are penalized and the manner in which non-Gaussian distributed data is handled. Example loss functions are provided and algorithm performance is illustrated in two groups of experimental results. The first group demonstrates that the proposed algorithms perform equivalent to several state-of-the-art machine learning algorithms on well-published, balanced data. The second group of results illustrates superior performance by the proposed algorithms on imbalanced, non-Gaussian data achieved by employing loss functions appropriate for the data characteristics and problem domain.

[1]  J. Tukey,et al.  The Fitting of Power Series, Meaning Polynomials, Illustrated on Band-Spectroscopic Data , 1974 .

[2]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent in Function Space , 2007 .

[3]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[4]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[5]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[6]  Sheng Chen,et al.  A Kernel-Based Two-Class Classifier for Imbalanced Data Sets , 2007, IEEE Transactions on Neural Networks.

[7]  Ethem Alpaydin,et al.  Multiclass Posterior Probability Support Vector Machines , 2008, IEEE Transactions on Neural Networks.

[8]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[9]  Peter L. Bartlett,et al.  Improved Generalization Through Explicit Optimization of Margins , 2000, Machine Learning.

[10]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[11]  Hui Li,et al.  An M-ary KMP classifier for multi-aspect target classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  R. Wolke,et al.  Iteratively Reweighted Least Squares: Algorithms, Convergence Analysis, and Numerical Comparisons , 1988 .

[13]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[14]  Fei-Yue Wang,et al.  Posterior probability support vector Machines for unbalanced data , 2005, IEEE Transactions on Neural Networks.

[15]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .

[16]  Andrzej Cichocki,et al.  Adaptive Blind Signal and Image Processing - Learning Algorithms and Applications , 2002 .

[17]  Daniel S. Yeung,et al.  Weighted Mahalanobis Distance Kernels for Support Vector Machines , 2007, IEEE Transactions on Neural Networks.

[18]  Lawrence Carin,et al.  Radial Basis Function Network for Multi-task Learning , 2005, NIPS.

[19]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[20]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[21]  D. O’Leary Robust regression computation computation using iteratively reweighted least squares , 1990 .

[22]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[23]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[24]  Dianne P. O’LEARYt ROBUST REGRESSION COMPUTATION USING ITERATIVELY REWEIGHTED LEAST SQUARES * , 2022 .

[25]  John Shawe-Taylor,et al.  Optimizing Classifers for Imbalanced Training Sets , 1998, NIPS.

[26]  Glenn Fung,et al.  Knowledge-Based Nonlinear Kernel Classifiers , 2003, COLT.

[27]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[28]  Glenn Fung,et al.  Knowledge-Based Support Vector Machine Classifiers , 2002, NIPS.

[29]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[30]  Andrzej Cichocki,et al.  Adaptive blind signal and image processing , 2002 .

[31]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[32]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .