Decision Forests Induce Characteristic Kernels

Decision forests are popular tools for classification and regression. These forests naturally produce proximity matrices measuring how often each pair of observations lies in the same leaf node. Recently it has been demonstrated that these proximity matrices can be thought of as kernels, connecting the decision forest literature to the extensive kernel machine literature. While other kernels are known to have strong theoretical properties, such as being characteristic kernels, no similar result is available for any decision forest based kernel. We show that a decision forest induced proximity can be made into a characteristic kernel, which can be used within an independence test to obtain a universally consistent test. We therefore empirically evaluate this kernel on a suite of 12 high-dimensional independence test settings: the decision forest induced kernel is shown to typically achieve substantially higher power than other methods.

[1]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[2]  C. F. Kossack,et al.  Rank Correlation Methods , 1949 .

[3]  Zoubin Ghahramani,et al.  The Random Forest Kernel and creating other kernels for big data from random partitions , 2014 .

[4]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[5]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[6]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[7]  R. Lyons Distance covariance in metric spaces , 2011, 1106.5758.

[8]  Cencheng Shen,et al.  The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing , 2018, ArXiv.

[9]  L. Breiman SOME INFINITY THEORY FOR PREDICTOR ENSEMBLES , 2000 .

[10]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[11]  Gábor J. Székely,et al.  The distance correlation t-test of independence in high dimension , 2013, J. Multivar. Anal..

[12]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[13]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[14]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[15]  Malka Gorfine,et al.  Consistent Distribution-Free $K$-Sample and Independence Tests for Univariate Random Variables , 2014, J. Mach. Learn. Res..

[16]  Eric W. Bridgeford,et al.  Discovering and deciphering relationships across disparate data modalities , 2016, eLife.

[17]  Carey E. Priebe,et al.  From Distance Correlation to Multiscale Graph Correlation , 2017, Journal of the American Statistical Association.

[18]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[19]  Shai Ben-David,et al.  Multi-task and Lifelong Learning of Kernels , 2015, ALT.

[20]  Barnabás Póczos,et al.  On the Decreasing Power of Kernel and Distance Based Nonparametric Hypothesis Tests in High Dimensions , 2014, AAAI.

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Jaime G. Carbonell,et al.  Lifelong Learning with Output Kernels , 2018 .

[23]  Runze Li,et al.  Projection correlation between two random vectors , 2017, Biometrika.

[24]  L. Wasserman,et al.  Robust Multivariate Nonparametric Tests via Projection-Pursuit , 2018, 1803.00715.

[25]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[26]  Rich Caruana,et al.  An empirical evaluation of supervised learning in high dimensions , 2008, ICML '08.

[27]  Yi Lin,et al.  Random Forests and Adaptive Nearest Neighbors , 2006 .