The Complexity of Improperly Learning Large Margin Halfspaces

where 〈w,x〉 is the inner product between the vectors x and w. For the 0-1 transfer function, φ0−1(a) = sgn(a)+1 2 , H becomes the class of halfspaces. We allow any transfer functions that satisfy the following (μ, ) margin condition: max{|φ(a)− φ0−1(a)| : |a| > μ} ≤ . For example, the sigmoid function φsig(a) = 1 1+e−a/σ satisfies the (μ, ) condition if σ ≤ μ/(log(1/ − 1). For an illustration see Figure 1. An improper agnostic learning algorithm, A, receives as input a training set of m i.i.d. samples from D, and returns a classifier (not necessarily from H). The output classifier is a random variable and we denote it by A(m). We use err(A(m)) to denote the expected generalization error of the predictor returned by A, where expectation is with respect to the random choice of the training set. We denote by time(A,m) the expected runtime of the algorithm A when running on a training set of m examples.

[1]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[2]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[3]  Shai Shalev-Shwartz,et al.  Agnostically Learning Halfspaces with Margin Errors , 2009 .

[4]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[5]  Philip D. Plowright,et al.  Convexity , 2019, Optimization for Chemical and Biochemical Engineering.

[6]  Hans Ulrich Simon,et al.  Efficient Learning of Linear Perceptrons , 2000, NIPS.

[7]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[8]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[9]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[10]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[11]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[12]  Prasad Raghavendra,et al.  Hardness of Learning Halfspaces with Noise , 2006, FOCS.

[13]  Vitaly Feldman,et al.  New Results for Learning Noisy Parities and Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[14]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[15]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.