Nonlinear separation of data via Mixed 0-1 Integer and Linear Programming

Abstract This paper presents a new mathematical programming-based learning methodology for separation of two types of data. Specifically, we develop a new l 1 -norm error distance metric and use it to develop a Mixed 0–1 Integer and Linear Programming (MILP) model that optimizes the interplay of user-provided discriminant functions, including kernel functions for support vector machines, to implement a nonlinear, nonconvex and/or disjoint decision boundary for the best separation of data at hand. With the concurrent optimization of discriminant functions, the MILP-based learning can be used for finding the optimal and least complex classification rule for noise-free data and for implementing a most robust classification rule for real-life data with noise. With extensive experiments on separation of two dimensional artificial datasets that are clean and noisy, we graphically illustrate the aforementioned advantages of the new MILP-based learning methodology. With experiments on real-life benchmark datasets from the UC Irvine Repository of machine learning databases, in comparison with the multisurface method and the support vector machines, we demonstrate the advantage of using and concurrently optimizing more than a single discriminant function for a robust separation of real-life data, hence the utility of the proposed methodology in supervised learning.

[1]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[2]  James E. Falk,et al.  Jointly Constrained Biconvex Programming , 1983, Math. Oper. Res..

[3]  O. Mangasarian Linear and Nonlinear Separation of Patterns by Linear Programming , 1965 .

[4]  Hiroshi Konno,et al.  Multiplicative Programming Problems , 1995 .

[5]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[6]  Hirotaka Nakayama,et al.  Pattern Classification by Linear Goal Programming and its Extensions , 1998, J. Glob. Optim..

[7]  Olvi L. Mangasarian,et al.  Multisurface method of pattern separation , 1968, IEEE Trans. Inf. Theory.

[8]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[9]  Nikolaos V. Sahinidis,et al.  Global Optimization of Multiplicative Programs , 2003, J. Glob. Optim..

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[12]  James E. Falk,et al.  The Surgical Separation of Sets , 1997, J. Glob. Optim..

[13]  Thomas F. Coleman,et al.  Large-Scale Numerical Optimization , 1990 .

[14]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[15]  Richard S. Johannes,et al.  Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[16]  Kwangsoo Kim,et al.  Data separation via a finite number of discriminant functions: A global optimization approach , 2007, Appl. Math. Comput..

[17]  Raymond J. Mooney,et al.  Symbolic and Neural Learning Algorithms: An Experimental Comparison , 1991, Machine Learning.

[18]  Nimrod Megiddo,et al.  On the complexity of polyhedral separability , 1988, Discret. Comput. Geom..

[19]  Paul S. Bradley,et al.  Mathematical Programming for Data Mining: Formulations and Challenges , 1999, INFORMS J. Comput..

[20]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[21]  Toshihide Ibaraki,et al.  An Implementation of Logical Analysis of Data , 2000, IEEE Trans. Knowl. Data Eng..

[22]  Garth P. McCormick,et al.  Computability of global solutions to factorable nonconvex programs: Part I — Convex underestimating problems , 1976, Math. Program..

[23]  Olvi L. Mangasarian,et al.  Generalized Support Vector Machines , 1998 .

[24]  A. A. Minzoni,et al.  An Asymptotic Solution for the Wave Equation in a Time-Dependent Domain , 1981 .

[25]  S. K. Mishra,et al.  Nonconvex Optimization and Its Applications , 2008 .

[26]  O. Mangasarian,et al.  Pattern Recognition Via Linear Programming: Theory and Application to Medical Diagnosis , 1989 .

[27]  Chris Carter,et al.  Assessing Credit Card Applications Using Machine Learning , 1987, IEEE Expert.

[28]  Kristin P. Bennett,et al.  Bilinear separation of two sets inn-space , 1993, Comput. Optim. Appl..

[29]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .