Nested Barycentric Coordinate System as an Explicit Feature Map

We propose a new embedding method which is particularly well-suited for settings where the sample size greatly exceeds the ambient dimension. Our technique consists of partitioning the space into simplices and then embedding the data points into features corresponding to the simplices' barycentric coordinates. We then train a linear classifier in the rich feature space obtained from the simplices. The decision boundary may be highly non-linear, though it is linear within each simplex (and hence piecewise-linear overall). Further, our method can approximate any convex body. We give generalization bounds based on empirical margin and a novel hybrid sample compression technique. An extensive empirical evaluation shows that our method consistently outperforms a range of popular kernel embedding methods.

[1]  Navin Goyal,et al.  Learning Convex Bodies is Hard , 2009, COLT.

[2]  Ben Taskar,et al.  The Pairwise Piecewise-Linear Embedding for Efficient Non-Linear Classification , 2013, ICML.

[3]  Miodrag Potkonjak,et al.  Nonparametric Combinatorial Regression for Shape Constrained Modeling , 2010, IEEE Transactions on Signal Processing.

[4]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[5]  Huseyin Ozkan,et al.  Data driven frequency mapping for computationally scalable object detection , 2011, 2011 8th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[6]  C. Hildreth Point Estimates of Ordinates of Concave Functions , 1954 .

[7]  Jiawei Han,et al.  Clustered Support Vector Machines , 2013, AISTATS.

[8]  Lee-Ad Gottlieb,et al.  Learning convex polytopes with margin , 2018, NeurIPS.

[9]  Mikhail Belkin,et al.  Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate , 2018, NeurIPS.

[10]  Peter W. Glynn,et al.  Consistency of Multidimensional Convex Regression , 2012, Oper. Res..

[11]  E. Seijo,et al.  Nonparametric Least Squares Estimation of a Multivariate Convex Regression Function , 2010, 1003.4765.

[12]  Gad Allon,et al.  Nonparametric Estimation of Concave Production Technologies by Entropic Methods , 2005 .

[13]  Holger Dette,et al.  Estimating a Convex Function in Nonparametric Regression , 2007 .

[14]  Aryeh Kontorovich,et al.  A Sharp Lower Bound for Agnostic Learning with Sample Compression Schemes , 2019, ALT.

[15]  Florent Perronnin,et al.  Large-scale image categorization with explicit data embedding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Santosh S. Vempala,et al.  An algorithmic theory of learning: Robust concepts and random projection , 1999, Machine Learning.

[17]  W. Muntz,et al.  Haematocrit in Elite Athletes , 1999, International journal of sports medicine.

[18]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[19]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[21]  Monica Beltrami,et al.  Grid-quadtree algorithm for support vector classification parameters selection , 2015 .

[22]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[23]  Bernd Gärtner A Subexponential Algorithm for Abstract Optimization Problems , 1992, FOCS.

[24]  Pedro Morin,et al.  On uniform consistent estimators for convex regression , 2011 .

[25]  Rocco A. Servedio,et al.  Learning intersections of halfspaces with a margin , 2004, J. Comput. Syst. Sci..

[26]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[27]  C. V. Jawahar,et al.  Generalized RBF feature maps for Efficient Detection , 2010, BMVC.

[28]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[29]  Pedro Morin,et al.  Approximating optimization problems over convex functions , 2008, Numerische Mathematik.

[30]  Stefanos Zafeiriou,et al.  On One-Shot Similarity Kernels: Explicit Feature Maps and Properties , 2013, 2013 IEEE International Conference on Computer Vision.

[31]  Vahid Tarokh,et al.  Learning Bounds for Greedy Approximation with Explicit Feature Maps from Multiple Kernels , 2018, NeurIPS.

[32]  David B. Dunson,et al.  Approximate Dynamic Programming for Storage Problems , 2011, ICML.

[33]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[34]  Cristian Sminchisescu,et al.  Random Fourier Approximations for Skewed Multiplicative Histogram Kernels , 2010, DAGM-Symposium.

[35]  S. Ernesto,et al.  Support vector machines and quad-trees applied to image compression , 2004, Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004..

[36]  Timo Kuosmanen Representation Theorem for Convex Nonparametric Least Squares , 2008, Econometrics Journal.

[37]  Subhash Khot,et al.  On the hardness of learning intersections of two halfspaces , 2011, J. Comput. Syst. Sci..

[38]  Koby Crammer,et al.  Automated gene-model curation using global discriminative learning , 2012, Bioinform..

[39]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[40]  He Ni,et al.  Multivariate convex support vector regression with semidefinite programming , 2012, Knowl. Based Syst..

[41]  Chih-Jen Lin,et al.  Training and Testing Low-degree Polynomial Data Mappings via Linear SVM , 2010, J. Mach. Learn. Res..

[42]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[43]  Renato D. C. Monteiro,et al.  Interior path following primal-dual algorithms. part II: Convex quadratic programming , 1989, Math. Program..

[44]  AI Koan,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[45]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[46]  Adam R. Klivans,et al.  Learning Neural Networks with Two Nonlinear Layers in Polynomial Time , 2017, COLT.

[47]  Adam R. Klivans,et al.  Learning Depth-Three Neural Networks in Polynomial Time , 2017, ArXiv.

[48]  David B. Dunson,et al.  Multivariate convex regression with adaptive partitioning , 2011, J. Mach. Learn. Res..

[49]  Alexander A. Sherstov,et al.  Cryptographic Hardness for Learning Intersections of Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[50]  Ondrej Chum Low Dimensional Explicit Feature Maps , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  Subhransu Maji,et al.  Efficient Classification for Additive Kernel SVMs , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Scott Davies,et al.  Multidimensional Triangulation and Interpolation for Reinforcement Learning , 1996, NIPS.

[53]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[54]  Nimrod Megiddo,et al.  On the complexity of polyhedral separability , 1988, Discret. Comput. Geom..