Upper Bounds for Error Rates of Linear Combinations of Classifiers

A useful notion of weak dependence between many classifiers constructed with the same training data is introduced. It is shown that if both this weak dependence is low and the expected margins are large, then decision rules based on linear combinations of these classifiers can achieve error rates that decrease exponentially fast. Empirical results with randomized trees and trees constructed via boosting and bagging show that weak dependence is present in these type of trees. Furthermore, these results also suggest that there is a trade-off between weak dependence and expected margins, in the sense that to compensate for low expected margins, there should be low mutual dependence between the classifiers involved in the linear combination.

[1]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[2]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[3]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[4]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[5]  L. Breiman USING ADAPTIVE BAGGING TO DEBIAS REGRESSIONS , 1999 .

[6]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[7]  K. Alexander,et al.  Probability Inequalities for Empirical Processes and a Law of the Iterated Logarithm , 1984 .

[8]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[9]  W. Philipp,et al.  Approximation Thorems for Independent and Weakly Dependent Random Vectors , 1979 .

[10]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[11]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[12]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[13]  Robert Tibshirani,et al.  Bias, Variance and Prediction Error for Classification Rules , 1996 .

[14]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[15]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[16]  George G. Roussas,et al.  Moment inequalities for mixing sequences of random variables , 1987 .

[17]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[18]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[19]  George G. Roussas,et al.  Minimum Distance Estimates with Rates under ø-mixing , 1997 .

[20]  Yali Amit,et al.  Speech recognition using randomized relational decision trees , 2001, IEEE Trans. Speech Audio Process..

[21]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[22]  Corinna Cortes,et al.  Boosting Decision Trees , 1995, NIPS.

[23]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[24]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .