On the Di cultyof Approximately Maximizing Agreements

We address the computational complexity of learning in the agnostic framework. For a variety of common concept classes we prove that, unless P=NP, there is no polynomial time approximation scheme for nding a member in the class that approximately maximizes the agreement with a given training sample. In particular our results apply to the classes of monomials, axis-aligned hyper-rectangles, closed balls and monotone monomials. For each of these classes we prove the NP-hardness of approximating maximal agreement to within some xed constant (independent of the sample size and of the dimensionality of the sample space). For the class of half-spaces, we prove that, for any > 0, it is NP-hard to approximately maximize agreements to within a factor of (418=415 ), improving on the best previously known constant for this problem, and using a simpler proof. An interesting feature of our proofs is that, for each of the classes we discuss, we nd patterns of training examples that, while being hard for approximating agreement within that concept class, allow e cient agreement maximization within other concept classes. These results bring up a new aspect of the model selection problem { they imply that the choice of hypothesis class for agnostic learning from among those considered in this paper can drastically e ect the computational complexity of the learning process.

[1]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[2]  Shai Ben-David,et al.  Hardness Results for Neural Network Approximation Problems , 1999, EuroCOLT.

[3]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[4]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[5]  Uriel Feige A threshold of ln n for approximating set cover (preliminary version) , 1996, STOC '96.

[6]  Edoardo Amaldi,et al.  The Complexity and Approximability of Finding Maximum Feasible Subsystems of Linear Relations , 1995, Theor. Comput. Sci..

[7]  Hava T. Siegelmann,et al.  On the complexity of training neural networks with continuous activation functions , 1995, IEEE Trans. Neural Networks.

[8]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[9]  R. Schapire,et al.  Toward Efficient Agnostic Learning , 1994 .

[10]  R. Schapire Toward Eecient Agnostic Learning , 1992 .

[11]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[12]  J. Stephen Judd,et al.  Neural network design and the complexity of learning , 1990, Neural network modeling and connectionism.

[13]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[14]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[15]  Michael Kharitonov,et al.  Cryptographic hardness of distribution-specific learning , 1993, STOC.

[16]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[17]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[18]  Dimitrios Gunopulos,et al.  Computing the Maximum Bichromatic Discrepancy with Applications to Computer Graphics and Machine Learning , 1996, J. Comput. Syst. Sci..

[19]  Johan Håstad,et al.  Some optimal inapproximability results , 2001, JACM.

[20]  Leslie G. Valiant,et al.  Cryptographic limitations on learning Boolean formulae and finite automata , 1994, JACM.