On Online Learning of Decision Lists

A fundamental open problem in computational learning theory is whether there is an attribute efficient learning algorithm for the concept class of decision lists (Rivest, 1987; Blum, 1996). We consider a weaker problem, where the concept class is restricted to decision lists with D alternations. For this class, we present a novel online algorithm that achieves a mistake bound of O(rDlog n), where r is the number of relevant variables, and n is the total number of variables. The algorithm can be viewed as a strict generalization of the famous Winnow algorithm by Littlestone (1988), and improves the O(r2Dlog n) mistake bound of Balanced Winnow. Our bound is stronger than a similar PAC-learning result of Dhagat and Hellerstein (1994). A combination of our algorithm with the algorithm suggested by Rivest (1987) might achieve even better bounds.

[1]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[2]  Leslie G. Valiant Projection learning , 1998, COLT' 98.

[3]  Manfred K. Warmuth,et al.  Learning nested differences of intersection-closed concept classes , 2004, Machine Learning.

[4]  Manfred K. Warmuth,et al.  Learning nested differences of intersection-closed concept classes , 2004, Machine Learning.

[5]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[6]  Lisa Hellerstein,et al.  Attribute-Efficient Learning in Query and Mistake-Bound Models , 1998, J. Comput. Syst. Sci..

[7]  Rocco A. Servedio Computational Sample Complexity and Attribute-Efficient Learning , 2000, J. Comput. Syst. Sci..

[8]  N. Littlestone Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .

[9]  Martin Anthony,et al.  Computational learning theory: an introduction , 1992 .

[10]  Lisa Hellerstein,et al.  Learning in the presence of finitely or infinitely many irrelevant attributes , 1991, COLT '91.

[11]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[12]  Mona Singh,et al.  Learning functions of k terms , 1990, COLT '90.

[13]  Rocco A. Servedio Computational sample complexity and attribute-efficient learning , 1999, STOC '99.

[14]  Vijay Raghavan,et al.  Monotone term decision lists , 2001, Theor. Comput. Sci..

[15]  Lisa Hellerstein,et al.  PAC learning with irrelevant attributes , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[16]  Avrim Blum,et al.  On-line Algorithms in Machine Learning , 1996, Online Algorithms.

[17]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[18]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[19]  Toshihide Ibaraki,et al.  Decision lists and related Boolean functions , 2002, Theor. Comput. Sci..

[20]  Avrim Blum Rank-r Decision Trees are a Subclass of r-Decision Lists , 1992, Inf. Process. Lett..