Comparison-Based Random Forests

Assume we are given a set of items from a general metric space, but we neither have access to the representation of the data nor to the distances between data points. Instead, suppose that we can actively choose a triplet of items (A,B,C) and ask an oracle whether item A is closer to item B or to item C. In this paper, we propose a novel random forest algorithm for regression and classification that relies only on such triplet comparisons. In the theory part of this paper, we establish sufficient conditions for the consistency of such a forest. In a set of comprehensive experiments, we then demonstrate that the proposed random forest is efficient both for classification and regression. In particular, it is even competitive with other methods that have direct access to the metric representation of the data.

[1]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[2]  Ulrike von Luxburg,et al.  Comparison-Based Nearest Neighbor Search , 2017, AISTATS.

[3]  J. Lawrence,et al.  A Catalog of Special Plane Curves , 2013 .

[4]  Ulrike von Luxburg,et al.  Kernel functions based on triplet comparisons , 2016, NIPS.

[5]  David J. Kriegman,et al.  Generalized Non-metric Multidimensional Scaling , 2007, AISTATS.

[6]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[7]  Jean-Philippe Vert,et al.  Consistency of Random Forests , 2014, 1405.2881.

[8]  Shachar Lovett,et al.  Near-optimal linear decision trees for k-SUM and related problems , 2017, Electron. Colloquium Comput. Complex..

[9]  Misha Denil,et al.  Consistency of Online Random Forests , 2013, ICML.

[10]  Hannes Heikinheimo,et al.  The Crowd-Median Algorithm , 2013, HCOMP.

[11]  Ulrike von Luxburg,et al.  Local Ordinal Embedding , 2014, ICML.

[12]  Luc Devroye,et al.  Consistency of Random Forests and Other Averaging Classifiers , 2008, J. Mach. Learn. Res..

[13]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[14]  David J. Kriegman,et al.  Learning Concept Embeddings with Combined Human-Machine Expertise , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Hannes Heikinheimo,et al.  Crowdsourced Nonparametric Density Estimation Using Relative Distances , 2015, HCOMP.

[16]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[17]  Paulo Cortez,et al.  A Proactive Intelligent Decision Support System for Predicting the Popularity of Online News , 2015, EPIA.

[18]  Subhransu Maji,et al.  Jointly Learning Multiple Measures of Similarities from Triplet Comparisons , 2015 .

[19]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[20]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[21]  Micha Sharir,et al.  A Nearly Quadratic Bound for the Decision Tree Complexity of k-SUM , 2017, SoCG.

[22]  Subhransu Maji,et al.  Learning Localized Perceptual Similarity Metrics for Interactive Categorization , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[23]  Kilian Q. Weinberger,et al.  Stochastic triplet embedding , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[24]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[25]  Ulrike von Luxburg,et al.  Dimensionality estimation without distances , 2015, AISTATS.

[26]  Shachar Lovett,et al.  Active Classification with Comparison Queries , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[27]  P. Cortez,et al.  A data mining approach to predict forest fires using meteorological data , 2007 .

[28]  Maria-Florina Balcan,et al.  Learning Combinatorial Functions from Pairwise Comparisons , 2016, COLT.

[29]  Sanjoy Dasgupta,et al.  Random projection trees and low dimensional manifolds , 2008, STOC.

[30]  Adam Tauman Kalai,et al.  Adaptively Learning the Crowd Kernel , 2011, ICML.

[31]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[32]  Erwan Scornet,et al.  A random forest guided tour , 2015, TEST.