Synergy of Monotonic Rules

This article describes a method for constructing a special rule (we call it synergy rule) that uses as its input information the outputs (scores) of several monotonic rules which solve the same pattern recognition problem. As an example of scores of such monotonic rules we consider here scores of SVM classifiers. In order to construct the optimal synergy rule, we estimate the conditional probability function based on the direct problem setting, which requires solving a Fredholm integral equation. Generally, solving a Fredholm equation is an ill-posed problem. However, in our model, we look for the solution of the equation in the set of monotonic and bounded functions, which makes the problem well-posed. This allows us to solve the equation accurately even with training data sets of limited size. In order to construct a monotonic solution, we use the set of functions that belong to Reproducing Kernel Hilbert Space (RKHS) associated with the INK-spline kernel (splines with Infinite Numbers of Knots) of degree zero. The paper provides details of the methods for finding multidimensional conditional probability in a set of monotonic functions to obtain the corresponding synergy rules. We demonstrate effectiveness of such rules for 1) solving standard pattern recognition problems, 2) constructing multi-class classification rules, 3) constructing a method for knowledge transfer from multiple intelligent teachers in the LUPI paradigm.

[1]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[2]  Rauf Izmailov,et al.  Learning with Intelligent Teacher: Similarity Control and Knowledge Transfer - In memory of Alexey Chervonenkis , 2015, SLDS.

[3]  Mary C. Meyer Semi-parametric additive constrained regression , 2013 .

[4]  Rauf Izmailov,et al.  Multidimensional splines with infinite number of knots as SVM kernels , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[5]  P. Massart The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality , 1990 .

[6]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[7]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[8]  J. Leeuw,et al.  Isotone Optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methods , 2009 .

[9]  Rauf Izmailov,et al.  V-matrix method of solving statistical inference problems , 2015, J. Mach. Learn. Res..

[10]  Rauf Izmailov,et al.  Statistical Inference Problems and Their Rigorous Solutions - In memory of Alexey Chervonenkis , 2015, SLDS.

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Wolfgang Konen,et al.  SVM Ensembles Are Better When Different Kernel Types Are Combined , 2013, ECDA.

[14]  Lin Ma,et al.  Empirical analysis of support vector machine ensemble classifiers , 2009, Expert Syst. Appl..

[15]  Rauf Izmailov,et al.  Constructive setting for problems of density ratio estimation , 2015, Stat. Anal. Data Min..

[16]  Peter Jaeckel,et al.  Monte Carlo methods in finance , 2002 .

[17]  G. Lecu'e Optimal rates of aggregation in classification under low noise assumption , 2006, math/0603447.

[18]  Vladimir Vapnik,et al.  A new learning paradigm: Learning using privileged information , 2009, Neural Networks.

[19]  Michael J. Best,et al.  Active set algorithms for isotonic regression; A unifying framework , 1990, Math. Program..

[20]  Alexandre B. Tsybakov,et al.  Optimal Rates of Aggregation , 2003, COLT.

[21]  Robert Andersen Modern Methods for Robust Regression , 2007 .

[22]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Empirical Inference Science (Information Science and Statistics) , 2006 .

[23]  Oleg Burdakov,et al.  A segmentation-based algorithm for large-scale partially ordered monotonic regression , 2011, Comput. Stat. Data Anal..

[24]  John F. Sowa,et al.  Knowledge Representation and Reasoning , 2000 .

[25]  Cha Zhang,et al.  Ensemble Machine Learning: Methods and Applications , 2012 .

[26]  Rauf Izmailov,et al.  Learning using privileged information: similarity control and knowledge transfer , 2015, J. Mach. Learn. Res..

[27]  Eric Fournié,et al.  Monte Carlo Methods in Finance , 2002 .

[28]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[29]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[30]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .