Optimal Expected-Distance Separating Halfspace

One recently proposed criterion to separate two data sets in discriminant analysis is to use a hyperplane, which minimizes the sum of distances to it from all the misclassified data points. Here all distances are supposed to be measured by way of some fixed norm, while misclassification means lying in the wrong halfspace. In this paper we study the problem of determining such an optimal halfspace when points are distributed according to an arbitrary random vector X in Rd. In the unconstrained case in dimension d, we prove that any optimal separating halfspace always balances the misclassified points. Moreover, under polyhedrality assumptions on the support of X, there always exists an optimal separating halfspace passing through d affinely independent points. These results extend in a natural way when different norms (or a fixed gauge) are used to measure distances, and we allow constraints modeling that certain points are forced to be correctly classified.

[1]  Jude W. Shavlik,et al.  Knowledge-Based Kernel Approximation , 2004, J. Mach. Learn. Res..

[2]  Gleb Beliakov Universal nonuniform random vector generator based on acceptance-rejection , 2005, TOMC.

[3]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[4]  Geoffrey J. McLachlan,et al.  Discriminant Analysis and Statistical Pattern Recognition: McLachlan/Discriminant Analysis & Pattern Recog , 2005 .

[5]  M. Falk A representation of bivariate extreme value distributions via norms on $$ \mathbb{R}^{2} $$ , 2006 .

[6]  Robert Serfling,et al.  Quantile functions for multivariate analysis: approaches and applications , 2002 .

[7]  Pierre Hansen,et al.  A branch and cut algorithm for nonconvex quadratically constrained quadratic programming , 1997, Math. Program..

[8]  James H. McClellan,et al.  Unified design algorithm for complex FIR and IIR filters , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[9]  Egon Balas,et al.  An Algorithm for Large Zero-One Knapsack Problems , 1980, Oper. Res..

[10]  Olvi L. Mangasarian,et al.  Arbitrary-norm separating plane , 1999, Oper. Res. Lett..

[11]  Emilio Carrizosa,et al.  Two-group classification via a biobjective margin maximization model , 2006, Eur. J. Oper. Res..

[12]  Frank Plastria,et al.  Dominators for Multiple-objective Quasiconvex Maximization Problems , 2000, J. Glob. Optim..

[13]  Jörg Fliege,et al.  Generalized Goal Programming: polynomial methods and applications , 2002, Math. Program..

[14]  C. Michelot,et al.  Geometrical properties of the Fermat-Weber problem , 1985 .

[15]  G. Beliakov Interpolation of Lipschitz functions , 2006 .

[16]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[17]  Minimax Designs in Linear Regression Models , 1995 .

[18]  Robert H. Berk,et al.  Dual Cones, Dual Norms, and Simultaneous Inference for Partially Ordered Means , 1996 .

[19]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[20]  Dan P. Scholnik Mixed-norm fir filter optimization using second-order cone programming , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  P. Hansen,et al.  Arbitrary-Norm Hyperplane Separation by Variable Neighborhood Search , 2005 .

[22]  F. Plastria,et al.  Gauge Distances and Median Hyperplanes , 2001 .

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[24]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[25]  Dimitri P. Bertsekas,et al.  Convex Analysis and Optimization , 2003 .

[26]  Glenn Fung,et al.  Knowledge-Based Support Vector Machine Classifiers , 2002, NIPS.

[27]  Frank Plastria,et al.  Optimal distance separating halfspace ∗ , 2002 .

[28]  P. Chaudhuri On a geometric notion of quantiles for multivariate data , 1996 .