论文信息 - Robust separation of finite sets via quadratics

Robust separation of finite sets via quadratics

Abstract Given a pair of finite disjoint sets A and B in Rn, a fundamental problem with many important applications is to efficiently determine a non-trivial, yet ‘simple’, function f (x) belonging to a prespecified family F which separates these sets when they are separable, or ‘nearly’ separates them when they are not. The most common class of functions F addressed to data are linear (because linear programming is often a convenient and efficient tool to employ both in determining separability and in generating a suitable separator). When the sets are not linearly separable, it is possible that the sets are separable over a wider class F of functions, e.g., quadratics. Even when the sets are linearly separable, another function may ‘better’ separate in the sense of more accurately predicting the status of points outside of A∪B. We define a ‘robust’ separator f as one for which the minimum Euclidean distance between A∪B and the set S={x∈ R n : f (x)=0} is maximal. In this paper we focus on robust quadratic separators and develop an algorithm using sequential linear programming to produce one when one exists. Numerical results are presented. Scope and purpose A fundamental problem with many important applications is to efficiently determine a nontrivial, yet ‘simple’, function f (x) which separates a pair of sets A and B in the sense that f is positive over A and negative over B. The function is then used to associate either A or B with points outside of the sets. As an example, if A consists of the results of tissue samples of cancerous patients, and B consists of the results of tissue samples from non-cancerous patients, a new sample c will be associated with either A or B according to the sign of the value f (c) . Most of the literature to date has focused on linear functions f as they are relatively easy to compute. In this paper we explore the use of quadratic functions. The advantage of using such functions is two fold — they can often separate when linear functions cannot, and they can separate more accurately than linear functions. We first define the notion of a ‘robust’ separating function which is as immune as possible (given the data) to small perturbations of the data. We then suggest an algorithm to (approximately) compute a robust quadratic separator, and show that it can be computed via a sequence of linear programs. The algorithm is tested on both randomly generated problems, as well as on the publicly available ‘Wisconsin Breast Cancer Database’. Its accuracy on this database is somewhat higher than that obtained by using linear robust separators.

James E. Falk | Vladimir E. Karlov | J. E. Falk

[1] O. Mangasarian,et al. Pattern Recognition Via Linear Programming: Theory and Application to Medical Diagnosis , 1989 .

[2] R. A. Cuninghame-Green. Mathematics for Operations Research, W.H. Marlow. John Wiley, Plymouth and London (1978), 483 pp, £ 14.00 , 1979 .

[3] A. D. Young. Mathematics for Operations Research , 1978 .

[4] F. Glover,et al. Simple but powerful goal programming models for discriminant problems , 1981 .

[5] J. S. Bridle. Pattern Recognition Techniques for Speech Recognition , 1980 .

[6] O. Mangasarian. Linear and Nonlinear Separation of Patterns by Linear Programming , 1965 .

[7] Antonie Stam,et al. Second order mathematical programming formulations for discriminant analysis , 1994 .

[8] Fred Glover,et al. A NEW CLASS OF MODELS FOR THE DISCRIMINANT PROBLEM , 1988 .

[9] J. B. Rosen. Pattern separation by convex programming , 1965 .

[10] Antonie Stam,et al. Nontraditional approaches to statistical classification: Some perspectives on L_p-norm methods , 1997, Ann. Oper. Res..

[11] Leon S. Lasdon,et al. Optimization Theory of Large Systems , 1970 .