Computational sample complexity and attribute-efficient learning

Two fundamental measures of the efficiency of a learning algorithm are its running time and the number of examples it requires (its sample complexity). The importance of polynomial time has long been acknowledged in learning theory, while recent work on attribute-efficiency has focused attention on algorithms which can learn from few examples. In this paper we demonstrate that even for simple concept classes, an inherent tradeoff can exist between running time and sample complexity. In our first construction, we present a concept class of l-decision lists and prove that while a computationally unbounded learner can learn the class from O(1) examples, under a standard cryptagraphic assumption any polynomial-time learner requires almost Q(n) examples. Using a different construction, we present a concept class of k-decision lists which exhibits a similar but stronger gap in sample complexity. These results strengthen the results of Decatur, Goldreich and Ron [9] on distribution-free computational sample complexity and come within a logarithmic factor of the largest possible gap for concept classes of k-decision lists. Finally, we construct a concept class of decision lists which can be learned attributeefficiently and can be learned in polynomial time but cannot be learned attribute-efficiently in polynomial time. This is the first result which shows that attribute-efficient learning can be computationally hard. The main tools we use are one-way permutations, error-correcting codes and pseudorandom generators.

[1]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[2]  Lisa Hellerstein,et al.  Learning in the presence of finitely or infinitely many irrelevant attributes , 1991, COLT '91.

[3]  Oded Goldreich,et al.  Foundations of Cryptography (Fragments of a Book) , 1995 .

[4]  Michael Kharitonov,et al.  Cryptographic lower bounds for learnability of Boolean functions on the uniform distribution , 1992, COLT '92.

[5]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[6]  Ryuhei Uehara,et al.  Optimal Attribute-Efficient Learning of Disjunction, Parity and Threshold Functions , 1997, EuroCOLT.

[7]  Leonid A. Levin,et al.  A hard-core predicate for all one-way functions , 1989, STOC '89.

[8]  David Haussler,et al.  Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[9]  Dana Ron,et al.  Computational sample complexity , 1997, COLT '97.

[10]  Lisa Hellerstein,et al.  Attribute-Efficient Learning in Query and Mistake-Bound Models , 1998, J. Comput. Syst. Sci..

[11]  Mihir Bellare,et al.  Lecture Notes on Cryptography , 2001 .

[12]  KearnsMichael,et al.  Cryptographic limitations on learning Boolean formulae and finite automata , 1994 .

[13]  Leslie G. Valiant Projection learning , 1998, COLT' 98.

[14]  Manuel Blum,et al.  How to Generate Cryptographically Strong Sequences of Pseudo Random Bits , 1982, FOCS.

[15]  Rusins Freivalds,et al.  Fast Probabilistic Algorithms , 1979, MFCS.

[16]  Avrim Blum,et al.  On-line Algorithms in Machine Learning , 1996, Online Algorithms.

[17]  Ido Dagan,et al.  Mistake-Driven Learning in Text Categorization , 1997, EMNLP.

[18]  N. Littlestone Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .

[19]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[20]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[21]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[22]  Leslie G. Valiant,et al.  Cryptographic limitations on learning Boolean formulae and finite automata , 1994, JACM.

[23]  M. Kearns,et al.  Recent Results on Boolean Concept Learning , 1987 .

[24]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[25]  Daniel A. Spielman,et al.  Expander codes , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[26]  Daniel A. Spielman,et al.  Linear-time encodable and decodable error-correcting codes , 1995, STOC '95.

[27]  Leslie G. Valiant,et al.  Fast probabilistic algorithms for hamiltonian circuits and matchings , 1977, STOC '77.

[28]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[29]  LittlestoneNick Learning Quickly When Irrelevant Attributes Abound , 1988 .

[30]  Temple F. Smith Occam's razor , 1980, Nature.

[31]  Oded Goldreich,et al.  Modern Cryptography, Probabilistic Proofs and Pseudorandomness , 1998, Algorithms and Combinatorics.

[32]  HausslerDavid,et al.  A general lower bound on the number of examples needed for learning , 1989 .

[33]  Manuel Blum,et al.  How to generate cryptographically strong sequences of pseudo random bits , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).