Coresets for Decision Trees of Signals

A k-decision tree t (or k-tree) is a recursive partition of a matrix (2D-signal) into k ≥ 1 block matrices (axis-parallel rectangles, leaves) where each rectangle is assigned a real label. Its regression or classification loss to a given matrix D of N entries (labels) is the sum of squared differences over every label in D and its assigned label by t. Given an error parameter ε ∈ (0, 1), a (k, ε)-coreset C of D is a small summarization that provably approximates this loss to every such tree, up to a multiplicative factor of 1 ± ε. In particular, the optimal k-tree of C is a (1 + ε)-approximation to the optimal k-tree of D. We provide the first algorithm that outputs such a (k, ε)-coreset for every such matrix D. The size |C| of the coreset is polynomial in k log(N)/ε, and its construction takes O(Nk) time. This is by forging a link between decision trees from machine learning – to partition trees in computational geometry. Experimental results on sklearn and lightGBM show that applying our coresets on real-world data-sets boosts the computation time of random forests and their parameter tuning by up to x10, while keeping similar accuracy. Full open source code is provided.

[1]  Mohammed Erritali,et al.  A comparative study of decision tree ID3 and C4.5 , 2014 .

[2]  Murad Tukan,et al.  Coresets for Near-Convex Functions , 2020, NeurIPS.

[3]  Narendra Ahuja,et al.  Coreset-Based Neural Network Compression , 2018, ECCV.

[4]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  John W. Fisher,et al.  Coresets for visual summarization with applications to loop closure , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[7]  L. Schulman,et al.  Universal ε-approximators for integrals , 2010, SODA '10.

[8]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[9]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[10]  Ibrahim Jubran,et al.  Autonomous Toy Drone via Coresets for Pose Estimation , 2020, Sensors.

[11]  Ibrahim Jubran,et al.  Faster PAC Learning and Smaller Coresets via Smoothed Analysis , 2020, ArXiv.

[12]  Sarajane Marques Peres,et al.  Gesture unit segmentation using support vector machines: segmenting gestures from rest positions , 2013, SAC '13.

[13]  Dimitris Bertsimas,et al.  Optimal classification trees , 2017, Machine Learning.

[14]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[15]  Dan Feldman,et al.  The single pixel GPS: learning big data signals from tiny coresets , 2012, SIGSPATIAL/GIS.

[16]  Yingqian Zhang,et al.  Learning Optimal Classification Trees Using a Binary Linear Program Formulation , 2019, BNAIC/BENELEARN.

[17]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[18]  Andreas Krause,et al.  Training Gaussian Mixture Models at Scale via Coresets , 2017, J. Mach. Learn. Res..

[19]  Ting He,et al.  Robust Coreset Construction for Distributed Machine Learning , 2019, 2019 IEEE Global Communications Conference (GLOBECOM).

[20]  Sampling Techniques for Supervised or Unsupervised Tasks , 2020, Unsupervised and Semi-Supervised Learning.

[21]  Andreas Krause,et al.  Practical Coreset Constructions for Machine Learning , 2017, 1703.06476.

[22]  C. Carathéodory Über den Variabilitätsbereich der Koeffizienten von Potenzreihen, die gegebene Werte nicht annehmen , 1907 .

[23]  David P. Woodruff,et al.  On Coresets for Logistic Regression , 2018, NeurIPS.

[24]  Jeff M. Phillips,et al.  Coresets and Sketches , 2016, ArXiv.

[25]  Emilio Carrizosa,et al.  Optimal randomized classification trees , 2021, Comput. Oper. Res..

[26]  Richard Peng,et al.  Uniform Sampling for Matrix Approximation , 2014, ITCS.

[27]  Meir Feder,et al.  Image compression via improved quadtree decomposition algorithms , 1994, IEEE Trans. Image Process..

[28]  Graham Cormode,et al.  Mergeable summaries , 2012, PODS '12.

[29]  Vladimir Braverman,et al.  New Frameworks for Offline and Streaming Coreset Constructions , 2016, ArXiv.

[30]  Mukesh K. Mohania,et al.  Decision trees for entity identification: approximation algorithms and hardness results , 2007, TALG.

[31]  Jon Louis Bentley,et al.  Decomposable Searching Problems I: Static-to-Dynamic Transformation , 1980, J. Algorithms.

[32]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[33]  P. Schrimpf,et al.  Dynamic Programming , 2011 .

[34]  Michael Langberg,et al.  A unified framework for approximating and clustering data , 2011, STOC.

[35]  Margo I. Seltzer,et al.  Optimal Sparse Decision Trees , 2019, NeurIPS.

[36]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[37]  Kristin P. Bennett,et al.  Decision Tree Construction Via Linear Programming , 1992 .

[38]  Ibrahim Jubran,et al.  Sets Clustering , 2020, ICML.

[39]  Kasturi R. Varadarajan,et al.  Geometric Approximation via Coresets , 2007 .

[40]  M. Hardy Regression with dummy variables , 1993 .

[41]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[42]  Ibrahim Jubran,et al.  Fast and Accurate Least-Mean-Squares Solvers for High Dimensional Data , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  E. Massera,et al.  On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario , 2008 .

[44]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[45]  John W. Fisher,et al.  Coresets for k-Segmentation of Streaming Data , 2014, NIPS.

[46]  Tamir Tassa,et al.  More Constraints, Smaller Coresets: Constrained Matrix Approximation of Sparse Big Data , 2015, KDD.

[47]  Bhekisipho Twala,et al.  AN EMPIRICAL COMPARISON OF TECHNIQUES FOR HANDLING INCOMPLETE DATA USING DECISION TREES , 2009, Appl. Artif. Intell..

[48]  Dan Feldman,et al.  Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering , 2013, SODA.

[49]  Vahab S. Mirrokni,et al.  Composable core-sets for diversity and coverage maximization , 2014, PODS.

[50]  D. Marpe,et al.  The H.264/MPEG4 advanced video coding standard and its applications , 2006, IEEE Communications Magazine.

[51]  Vili Podgorelec,et al.  Decision trees , 2018, Encyclopedia of Database Systems.

[52]  Alaa Maalouf,et al.  Overview of accurate coresets , 2021, WIREs Data Mining Knowl. Discov..

[53]  Casper Solheim Bojer,et al.  Kaggle forecasting competitions: An overlooked learning opportunity , 2020, ArXiv.

[54]  David Haussler,et al.  Epsilon-nets and simplex range queries , 1986, SCG '86.

[55]  Anna Veronika Dorogush,et al.  CatBoost: gradient boosting with categorical features support , 2018, ArXiv.