Rounding Methods for Discrete Linear Classification

Learning discrete linear classifiers is known as a difficult challenge. In this paper, this learning task is cast as combinatorial optimization problem: given a training sample formed by positive and negative feature vectors in the Euclidean space, the goal is to find a discrete linear function that minimizes the cumulative hinge loss of the sample. Since this problem is NP-hard, we examine two simple rounding algorithms that discretize the fractional solution of the problem. Generalization bounds are derived for several classes of binary-weighted linear functions, by analyzing the Rademacher complexity of these classes and by establishing approximation bounds for our rounding algorithms. Our methods are evaluated on both synthetic and real-world data.

[1]  Jude W. Shavlik,et al.  Extracting refined rules from knowledge-based neural networks , 2004, Machine Learning.

[2]  Ambuj Tewari,et al.  On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[3]  M. Opper,et al.  On the ability of the optimal perceptron to generalise , 1990 .

[4]  David P. Williamson,et al.  The Design of Approximation Algorithms , 2011 .

[5]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[7]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[8]  Marek Karpinski,et al.  On Some Tighter Inapproximability Results (Extended Abstract) , 1999, ICALP.

[9]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[10]  Marek Karpinski,et al.  A note on approximating Max-Bisection on regular graphs , 2001, Inf. Process. Lett..

[11]  Wolfgang Kinzel,et al.  Learning algorithm for a neural network with binary synapses , 1990 .

[12]  Prabhakar Raghavan,et al.  Randomized rounding: A technique for provably good algorithms and algorithmic proofs , 1985, Comb..

[13]  Santosh S. Venkatesh,et al.  On learning binary weights for majority functions , 1991, COLT '91.

[14]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[15]  Mario Marchand,et al.  On Learning Perceptrons with Binary Weights , 1993, Neural Computation.

[16]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[17]  Mario Marchand,et al.  Average case analysis of the clipped Hebb rule for nonoverlapping perception networks , 1993, COLT '93.