Ensemble Learning with Supervised Kernels

Kernel-based methods have outstanding performance on many machine learning and pattern recognition tasks. However, they are sensitive to kernel selection, they may have low tolerance to noise, and they can not deal with mixed-type or missing data. We propose to derive a novel kernel from an ensemble of decision trees. This leads to kernel methods that naturally handle noisy and heterogeneous data with potentially non-randomly missing values. We demonstrate excellent performance of regularized least square learners based on such kernels.

[1]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[2]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  J. Friedman Stochastic gradient boosting , 2002 .

[5]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[6]  A. E. Hoerl,et al.  Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[7]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[8]  T. Poggio,et al.  General conditions for predictivity in learning theory , 2004, Nature.

[9]  Parag A. Pathak,et al.  Massachusetts Institute of Technology , 1964, Nature.

[10]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[11]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[12]  Tomaso Poggio,et al.  Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  André Elisseeff,et al.  Algorithmic Stability and Generalization Performance , 2000, NIPS.

[15]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[16]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[17]  Giorgio Valentini,et al.  Low Bias Bagged Support Vector Machines , 2003, ICML.

[18]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[20]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[21]  T. Poggio,et al.  Bagging Regularizes , 2002 .

[22]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[23]  T. Poggio,et al.  Statistical Learning: Stability is Sufficient for Generalization and Necessary and Sufficient for Consistency of Empirical Risk Minimization , 2002 .

[24]  Felipe Cucker,et al.  Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[25]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[26]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[27]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.