Noise-Tolerant Occam Algorithms and Their Applications to Learning Decision Trees

In the distribution-independent model of concept learning of Valiant, Angluin and Laird have introduced a formal model of noise process, called classification noise process, to study how to compensate for randomly introduced errors, or noise, in classifying the example data. In this article, we investigate the problem of designing efficient learning algorithms in the presence of classification noise. First, we develop a technique of building efficient robust learning algorithms, called noise-tolerant Occam algorithms, and show that using them, one can construct a polynomial-time algorithm for learning a class of Boolean functions in the presence of classification noise. Next, as an instance of such problems of learning in the presence of classification noise, we focus on the learning problem of Boolean functions represented by decision trees. We present a noise-tolerant Occam algorithm for k-DL (the class of decision lists with conjunctive clauses of size at most k at each decision introduced by Rivest) and hence conclude that k-DL is polynomially learnable in the presence of classification noise. Further, we extend the noise-tolerant Occam algorithm for k-DL to one for r-DT (the class of decision trees of rank at most r introduced by Ehrenfeucht and Haussler) and conclude that r-DT is polynomially learnable in the presence of classification noise.

[1]  Robert E. Schapire,et al.  Design and analysis of efficient learning algorithms , 1992, ACM Doctoral dissertation award ; 1991.

[2]  Michael Kearns,et al.  Computational complexity of machine learning , 1990, ACM distinguished dissertations.

[3]  Leslie G. Valiant,et al.  Learning Disjunction of Conjunctions , 1985, IJCAI.

[4]  David Haussler,et al.  Learning decision trees from random examples , 1988, COLT '88.

[5]  P. Laird Learning from Good and Bad Data , 1988 .

[6]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[7]  B. Natarajan On learning sets and functions , 2004, Machine Learning.

[8]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[9]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[10]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[11]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[12]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[13]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[14]  Robert H. Sloan,et al.  Corrigendum to types of noise in data for concept learning , 1988, COLT '92.

[15]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[16]  Robert E. Schapire,et al.  Exact identification of circuits using fixed points of amplification functions , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[17]  John Mingers,et al.  An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[18]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[19]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[20]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[21]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.