Attribute-Efficient Learning and Weight-Degree Tradeoffs for Polynomial Threshold Functions

We study the challenging problem of learning decision lists attribute-efficiently, giving both positive and negative results. Our main positive result is a new tradeoff between the running time and mistake bound for learning length-k decision lists over n Boolean variables. When the allowed running time is relatively high, our new mistake bound improves significantly on the mistake bound of the best previous algorithm of Klivans and Servedio (Klivans and Servedio, 2006). Our main negative result is a new lower bound on the weight of any degree-d polynomial threshold function (PTF) that computes a particular decision list overk variables (the “ODD-MAXBIT” function). The main result of Beigel (Beigel, 1994) is a weight lower bound of 2 (k=d 2 ) , which was shown to be essentially optimal for d k 1=3 by Klivans and Servedio. Here we prove a 2 ( p k=d) lower bound, which improves on Beigel’s lower bound for d > k 1=3 : This lower bound establishes strong limitations on the effectiveness of the Klivans and Servedio approach and suggests that it may be difficult to improve on our positive result. The main tool used in our lower bound is a new variant of Markov’s classical inequality which may be of independent interest; it provides a bound on the derivative of a univariate polynomial in terms of both its degree and the size of its coefficients.

[1]  Richard Beigel Perceptrons, PP, and the polynomial hierarchy , 2005, computational complexity.

[2]  Ran El-Yaniv,et al.  On Online Learning of Decision Lists , 2002, J. Mach. Learn. Res..

[3]  N. Littlestone,et al.  Learning in the presence of finitely or infinitely many irrelevant attributes , 1991, COLT '91.

[4]  Alexander A. Sherstov,et al.  Lower Bounds for Agnostic Learning via Approximate Rank , 2010, computational complexity.

[5]  Salil P. Vadhan,et al.  Computational Complexity , 2005, Encyclopedia of Cryptography and Security.

[6]  Nick Littlestone,et al.  Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm , 2004, Machine Learning.

[7]  Alexander A. Sherstov Separating AC0 from depth-2 majority circuits , 2007, STOC '07.

[8]  Vitaly Feldman,et al.  Evolvability from learning algorithms , 2008, STOC.

[9]  Harry Buhrman,et al.  On Computation and Communication with Small Bias , 2007, Twenty-Second Annual IEEE Conference on Computational Complexity (CCC'07).

[10]  Rocco A. Servedio,et al.  Attribute-efficient learning of decision lists and linear threshold functions under unconcentrated distributions , 2006, NIPS.

[11]  Alexander A. Sherstov Halfspace Matrices , 2007, Twenty-Second Annual IEEE Conference on Computational Complexity (CCC'07).

[12]  Leslie G. Valiant Projection learning , 1998, COLT' 98.

[13]  Alexander A. Sherstov The Pattern Matrix Method , 2009, SIAM J. Comput..

[14]  Alexander A. Sherstov SeparatingAC0 from Depth-2 Majority Circuits , 2009, SIAM J. Comput..

[15]  Tamás Erdélyi,et al.  Markov-Bernstein type inequalities under Littlewood-type coefficient constraints☆ , 2000 .

[16]  Rocco A. Servedio,et al.  Toward Attribute Efficient Learning of Decision Lists and Parities , 2006, J. Mach. Learn. Res..

[17]  Pavel Pudlák,et al.  Computing Boolean functions by polynomials and threshold circuits , 1998, computational complexity.

[18]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[19]  Noam Nisan,et al.  On the degree of boolean functions as real polynomials , 1992, STOC '92.

[20]  Avrim Blum Learning Boolean Functions in an Infinite Atribute Space (Extended Abstract) , 1990, STOC 1990.

[21]  Pavel Pudlák,et al.  On computing Boolean functions by sparse real polynomials , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.