Experiments with Projection Learning

Excessive information is known to degrade the classification performance of many machine learning algorithms. Attribute-efficient learning algorithms can tolerate irrelevant attributes without their performance being affected too much. Valiant's projection learning is a way to combine such algorithms so that this desired property is maintained. The archetype attribute-efficient learning algorithm Winnow and, especially, combinations of Winnow have turned out empirically successful in domains containing many attributes. However, projection learning as proposed by Valiant has not yet been evaluated empirically. We study how projection learning relates to using Winnow as such and with an extended set of attributes. We also compare projection learning with decision tree learning and Naive Bayes on UCI data sets.Projection learning systematically enhances the classification accuracy of Winnow, but the cost in time and space consumption can be high. Balanced Winnow seems to be a better alternative than the basic algorithm for learning the projection hypotheses. However, Balanced Winnow is not well suited for learning the second level (projective disjunction) hypothesis. The on-line approach projection learning does not fall far behind in classification accuracy from batch algorithms such as decision tree learning and Naive Bayes on the UCI data sets that we used.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Dan Roth,et al.  A Winnow-Based Approach to Context-Sensitive Spelling Correction , 1998, Machine Learning.

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  Leslie G. Valiant,et al.  Circuits of the mind , 1994 .

[5]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[6]  N. Littlestone Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .

[7]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[8]  Ryuhei Uehara,et al.  Identification of Partial Disjunction, Parity, and Threshold Functions , 2000, Theor. Comput. Sci..

[9]  Ian H. Witten,et al.  Stacked generalization: when does it work? , 1997, IJCAI 1997.

[10]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[11]  Manfred K. Warmuth,et al.  The Perceptron Algorithm Versus Winnow: Linear Versus Logarithmic Mistake Bounds when Few Input Variables are Relevant (Technical Note) , 1997, Artif. Intell..

[12]  Narendra Ahuja,et al.  Learning to Recognize Three-Dimensional Objects , 2002, Neural Computation.

[13]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[14]  Ian H. Witten,et al.  Stacked Generalizations: When Does It Work? , 1997, IJCAI.

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  Rocco A. Servedio Computational Sample Complexity and Attribute-Efficient Learning , 2000, J. Comput. Syst. Sci..

[18]  Leslie G. Valiant,et al.  A neuroidal architecture for cognitive computation , 2000, JACM.

[19]  David Haussler,et al.  Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[20]  Rocco A. Servedio,et al.  On PAC learning using Winnow, Perceptron, and a Perceptron-like algorithm , 1999, COLT '99.

[21]  Leslie G. Valiant,et al.  Projection Learning , 1998, COLT' 98.

[22]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[23]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[24]  Leslie G. Valiant,et al.  Relational Learning for NLP using Linear Threshold Elements , 1999, IJCAI.

[25]  David G. Stork,et al.  Pattern Classification , 1973 .

[26]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Based Algorithms: Results on a Calendar Scheduling Domain , 1995, ICML.

[27]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .