Screening Data Points in Empirical Risk Minimization via Ellipsoidal Regions and Safe Loss Function

We design simple screening tests to automatically discard data samples in empirical risk minimization without losing optimization guarantees. We derive loss functions that produce dual objectives with a sparse solution. We also show how to regularize convex losses to ensure such a dual sparsity-inducing property, and propose a general method to design screening tests for classification or regression based on ellipsoidal approximations of the optimal set. In addition to producing computational gains, our approach also allows us to compress a dataset into a subset of representative points.

[1]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[2]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[3]  Laurent El Ghaoui,et al.  Safe Feature Elimination for the LASSO and Sparse Supervised Learning Problems , 2010, 1009.4219.

[4]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[5]  Julien Mairal,et al.  Cyanure: An Open-Source Toolbox for Empirical Risk Minimization for Python, C++, and soon more , 2019, ArXiv.

[6]  André F. T. Martins,et al.  Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms , 2018, AISTATS.

[7]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[8]  Kohei Ogawa,et al.  Safe Sample Screening for Support Vector Machines , 2014, 1401.6740.

[9]  Francis R. Bach,et al.  Learning Sparse Penalties for Change-point Detection using Max Margin Interval Regression , 2013, ICML.

[10]  Ichiro Takeuchi,et al.  Simultaneous Safe Screening of Features and Samples in Doubly Sparse Modeling , 2016, ICML.

[11]  Michael J. Todd,et al.  The Ellipsoid Method: A Survey , 1980 .

[12]  Alexandre Gramfort,et al.  Mind the duality gap: safer rules for the Lasso , 2015, ICML.

[13]  Marc Teboulle,et al.  Smoothing and First Order Methods: A Unified Framework , 2012, SIAM J. Optim..

[14]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[15]  Jiayu Zhou,et al.  A Safe Screening Rule for Sparse Logistic Regression , 2013, NIPS.

[16]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[17]  Ichiro Takeuchi,et al.  Safe Screening of Non-Support Vectors in Pathwise SVM Computation , 2013, ICML.

[18]  F. Bach,et al.  Optimization with Sparsity-Inducing Penalties (Foundations and Trends(R) in Machine Learning) , 2011 .

[19]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[20]  Julien Mairal,et al.  End-to-End Kernel Learning with Supervised Convolutional Kernel Networks , 2016, NIPS.

[21]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[22]  Hyunjoong Kim,et al.  Functional Analysis I , 2017 .

[23]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24]  J. Moreau Fonctions convexes duales et points proximaux dans un espace hilbertien , 1962 .

[25]  Jie Wang,et al.  Lasso screening rules via dual polytope projection , 2012, J. Mach. Learn. Res..

[26]  R. Tibshirani,et al.  Strong rules for discarding predictors in lasso‐type problems , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[27]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[29]  Kristiaan Pelckmans,et al.  An ellipsoid based, two-stage screening test for BPDN , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[30]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds , 2003, NIPS.

[31]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .