Heaviside Set Constrained Optimization: Optimality and Newton Method

Data in the real world frequently involve binary status: truth or falsehood, positiveness or negativeness, similarity or dissimilarity, spam or non-spam, and to name a few, with applications into the regression, classification problems and so on. To characterize the binary status, one of the ideal functions is the Heaviside step function that returns one for one status and zero for the other. Hence, it is of dis-continuity. Because of this, the conventional approaches to deal with the binary status tremendously benefit from its continuous surrogates. In this paper, we target the Heaviside step function directly and study the Heaviside set constrained optimization: calculating the tangent and normal cones of the feasible set, establishing several first-order sufficient and necessary optimality conditions, as well as developing a Newton type method that enjoys locally quadratic convergence and excellent numerical performance.

[1]  Steven Vajda,et al.  The Theory of Linear Economic Models , 1960 .

[2]  R. C. Thompson Principal submatrices IX: Interlacing inequalities for singular values of submatrices , 1972 .

[3]  Aaron K. Han Non-parametric analysis of a generalized regression model: the maximum rank correlation estimator , 1987 .

[4]  R. Sherman The Limiting Distribution of the Maximum Rank Correlation Estimator , 1993 .

[5]  H. Luetkepohl The Handbook of Matrices , 1996 .

[6]  Federico Girosi,et al.  Reducing the run-time complexity of Support Vector Machines , 1999 .

[7]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[8]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  Peter L. Bartlett,et al.  Improved Generalization Through Explicit Optimization of Margins , 2000, Machine Learning.

[11]  Peter Sollich,et al.  Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities , 2002, Machine Learning.

[12]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[13]  Jian Huang,et al.  Regularized ROC method for disease classification and biomarker selection with microarray data , 2005, Bioinform..

[14]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[15]  Jason Weston,et al.  Large Scale Transductive SVMs , 2006, J. Mach. Learn. Res..

[16]  T. Cai,et al.  Combining Predictors for Classification Using the Area under the Receiver Operating Characteristic Curve , 2006, Biometrics.

[17]  Yufeng Liu,et al.  Robust Truncated Hinge Loss Support Vector Machines , 2007 .

[18]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[19]  Ling Li,et al.  Optimizing 0/1 Loss for Perceptrons by Random Coordinate Descent , 2007, 2007 International Joint Conference on Neural Networks.

[20]  Jian Huang,et al.  Combining Multiple Markers for Classification Using ROC , 2007, Biometrics.

[21]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[22]  Richard G. Baraniuk,et al.  1-Bit compressive sensing , 2008, 2008 42nd Annual Conference on Information Sciences and Systems.

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[24]  Boris S. Mordukhovich,et al.  Lipschitzian stability of parametric variational inequalities over generalized polyhedra in Banach s , 2011 .

[25]  J. Paul Brooks,et al.  Support Vector Machines with the Ramp Loss and the Hard Margin Loss , 2011, Oper. Res..

[26]  Y. Li,et al.  AUC-based biomarker ensemble with an application on gene scores predicting low bone mineral density , 2011, Bioinform..

[27]  Wotao Yin,et al.  Trust, But Verify: Fast and Accurate Signal Recovery From 1-Bit Compressive Measurements , 2011, IEEE Transactions on Signal Processing.

[28]  Ming Yan,et al.  Robust 1-bit Compressive Sensing Using Adaptive Outlier Pursuit , 2012, IEEE Transactions on Signal Processing.

[29]  Yaniv Plan,et al.  Robust 1-bit Compressed Sensing and Sparse Logistic Regression: A Convex Programming Approach , 2012, IEEE Transactions on Information Theory.

[30]  Scott Sanner,et al.  Algorithms for Direct 0-1 Loss Optimization in Binary Classification , 2013, ICML.

[31]  Boris S. Mordukhovich,et al.  An Easy Path to Convex Analysis and Applications , 2013, Synthesis Lectures on Mathematics & Statistics.

[32]  Laurent Jacques,et al.  Robust 1-Bit Compressive Sensing via Binary Stable Embeddings of Sparse Vectors , 2011, IEEE Transactions on Information Theory.

[33]  Wotao Yin,et al.  Improved Iteratively Reweighted Least Squares for Unconstrained Smoothed 퓁q Minimization , 2013, SIAM J. Numer. Anal..

[34]  Huazhen Lin,et al.  Smoothed rank correlation of the linear transformation regression model , 2013, Comput. Stat. Data Anal..

[35]  Weiqiang Dong On Bias , Variance , 0 / 1-Loss , and the Curse of Dimensionality RK April 13 , 2014 .

[36]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[37]  Johan A. K. Suykens,et al.  Support Vector Machine Classifier With Pinball Loss , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Cynthia Rudin,et al.  Supersparse linear integer models for optimized medical scoring systems , 2015, Machine Learning.

[39]  Johan A. K. Suykens,et al.  Robust Support Vector Machines for Classification with Nonconvex and Smooth Losses , 2016, Neural Computation.

[40]  Yuesheng Xu,et al.  Noisy 1-bit compressive sensing: Models and algorithms , 2016 .

[41]  C. Perng On a class of theorems equivalent to Farkas's lemma , 2017 .

[42]  Jian Huang,et al.  Robust Decoding from 1-Bit Compressive Sampling with Ordinary and Regularized Least Squares , 2018, SIAM J. Sci. Comput..

[43]  Johan A. K. Suykens,et al.  Pinball Loss Minimization for One-bit Compressive Sensing , 2015, Neurocomputing.

[44]  Katya Scheinberg,et al.  Novel and Efficient Approximations for Zero-One Loss of Linear Classifiers , 2019, ArXiv.

[45]  Shenglong Zhou,et al.  Support Vector Machine Classifier via $L_{0/1}$L0/1, 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.