Computational complexity of machine learning

This thesis is a study of the computational complexity of machine learning from examples in the distribution-free model introduced by L. G. Valiant (V84). In the distribution-free model, a learning algorithm receives positive and negative examples of an unknown target set (or concept) that is chosen from some known class of sets (or concept class). These examples are generated randomly according to a fixed but unknown probability distribution representing Nature, and the goal of the learning algorithm is to infer an hypothesis concept that closely approximates the target concept with respect to the unknown distribution. This thesis is concerned with proving theorems about learning in this formal mathematical model. We are interested in the phenomenon of efficient learning in the distribution-free model, in the standard polynomial-time sense. Our results include general tools for determining the polynomial-time learnability of a concept class, an extensive study of efficient learning when errors are present in the examples, and lower bounds on the number of examples required for learning in our model. A centerpiece of the thesis is a series of results demonstrating the computational difficulty of learning a number of well-studied concept classes. These results are obtained by reducing some apparently hard number-theoretic problems from cryptography to the learning problems. The hard-to-learn concept classes include the sets represented by Boolean formulae, deterministic finite automata and a simplified form of neural networks. We also give algorithms for learning powerful concept classes under the uniform distribution, and give equivalences between natural models of efficient learnability. This thesis also includes detailed definitions and motivation for the distribution-free model, a chapter discussing past research in this model and related models, and a short list of important open problems.

[1]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[2]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[3]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[4]  László Lovász,et al.  On the ratio of optimal integral and fractional covers , 1975, Discret. Math..

[5]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[6]  L. G. H. Cijan A polynomial algorithm in linear programming , 1979 .

[7]  Leslie G. Valiant,et al.  Fast probabilistic algorithms for hamiltonian circuits and matchings , 1977, STOC '77.

[8]  Richard M. Dudley,et al.  Some special vapnik-chervonenkis classes , 1981, Discret. Math..

[9]  Manuel Blum,et al.  How to Generate Cryptographically Strong Sequences of Pseudo Random Bits , 1982, FOCS.

[10]  Andrew Chi-Chih Yao,et al.  Theory and application of trapdoor functions , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[11]  Avi Wigderson,et al.  A new approximate graph coloring algorithm , 1982, STOC '82.

[12]  Carl H. Smith,et al.  Inductive Inference: Theory and Methods , 1983, CSUR.

[13]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[14]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, Comb..

[15]  Stephen A. Cook,et al.  Log Depth Circuits for Division and Related Problems , 1986, SIAM J. Comput..

[16]  Evangelos Kranakis Primality and cryptography , 1986, Wiley-Teubner series in computer science.

[17]  David Haussler,et al.  Epsilon-nets and simplex range queries , 1986, SCG '86.

[18]  Piotr Berman,et al.  Learning one-counter languages in polynomial time , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[19]  Ronald L. Rivest,et al.  Diversity-based inference of finite automata , 1994, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[20]  Balas K. Natarajan,et al.  On learning Boolean functions , 1987, STOC.

[21]  Leslie G. Valiant,et al.  On the learnability of Boolean formulae , 1987, STOC.

[22]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[23]  M. Kearns,et al.  Recent Results on Boolean Concept Learning , 1987 .

[24]  Dana Angluin,et al.  Learning with hints , 1988, COLT '88.

[25]  Leslie G. Valiant,et al.  Functionality in neural nets , 1988, COLT '88.

[26]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[27]  Leonard Pitt,et al.  Reductions among prediction problems: on the difficulty of predicting automata , 1988, [1988] Proceedings. Structure in Complexity Theory Third Annual Conference.

[28]  Jeffrey Scott Vitter,et al.  Learning in parallel , 1988, COLT '88.

[29]  Ronald L. Rivest,et al.  Learning complicated concepts reliably and usefully , 1988, Annual Conference Computational Learning Theory.

[30]  Oded Goldreich,et al.  RSA and Rabin Functions: Certain Parts are as Hard as the Whole , 1988, SIAM J. Comput..

[31]  J. Stephen Judd,et al.  Learning in neural networks , 1988, COLT '88.

[32]  George Shackelford,et al.  Learning k-DNF with noise in the attributes , 1988, Annual Conference Computational Learning Theory.

[33]  P. Laird Learning from Good and Bad Data , 1988 .

[34]  Nathan Linial,et al.  Results on learnability and the Vapnik-Chervonenkis dimension , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[35]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[36]  David Haussler,et al.  Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[37]  David Haussler,et al.  Equivalence of models for polynomial learnability , 1988, COLT '88.

[38]  Luc Devroye,et al.  Automatic Pattern Recognition: A Study of the Probability of Error , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Ming Li,et al.  A theory of learning simple concepts under simple distributions and average case complexity for the universal distribution , 1989, 30th Annual Symposium on Foundations of Computer Science.

[40]  David Haussler,et al.  Proceedings of the 1988 Workshop on Computational Learning Theory : MIT, August 3-5, 1988 , 1989 .

[41]  Noam Nisan,et al.  Constant depth circuits, Fourier transform, and learnability , 1989, 30th Annual Symposium on Foundations of Computer Science.

[42]  David Haussler,et al.  Generalizing the PAC model: sample size bounds from metric dimension-based uniform convergence results , 1989, 30th Annual Symposium on Foundations of Computer Science.

[43]  Ronald L. Rivest,et al.  Inference of finite automata using homing sequences , 1989, STOC '89.

[44]  Leonard Pitt,et al.  A polynomial-time algorithm for learning k-variable pattern languages from examples , 1989, COLT '89.

[45]  Karsten A. Verbeurgt Learning DNF under the uniform distribution in quasi-polynomial time , 1990, COLT '90.

[46]  Leonard Pitt,et al.  On the necessity of Occam algorithms , 1990, STOC '90.

[47]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[48]  Ronald L. Rivest,et al.  The Design and Analysis of Computer Algorithms , 1990 .

[49]  J. Reif,et al.  On Threshold Circuits and Polynomial Computation , 1992, SIAM J. Comput..

[50]  Robert H. Sloan,et al.  Corrigendum to types of noise in data for concept learning , 1988, COLT '92.