论文信息 - Discussion of the paper "Arcing Classifiers" by Leo Breiman

Discussion of the paper "Arcing Classifiers" by Leo Breiman

We would like to thank Leo Breiman for his interest in our work on boosting, for his extensive experiments with the AdaBoost algorithm (which he calls arc-fs) and for his very generous exposition of our work to the statistics community. Breiman’s experiments and our intensive email communication over the last two years have inspired us to think about boosting in new ways. These new ways of thinking, in turn, led us to consider new ways for measuring the performance of the boosting algorithm and for predicting its performance on out-of-sample instances. It is exciting for us to have this communication channel with such a prominent practical statistician. As computer scientists we try to derive our algorithms from theoretical frameworks. While these frameworks cannot capture all of our prior beliefs about the nature of the real-world problems, they can sometimes capture important aspects of the problem in new and useful ways. In our case, boosting was originally derived as an answer to a theoretical question posed by Kearns and Valiant [7] within the PAC framework, a model for the study of theoretical machine learning first proposed by Valiant [15]. We probably would have never thought about these algorithms had the theoretical question not been posed. On the other hand, an experimental statistician such as Leo is usually more interested in the actual behavior of algorithms on existing data-sets and pays a lot of attention to the actual values of various variables during the run of the algorithm. Running AdaBoost on several synthetic and real-world datasets, Breiman observed that the algorithm has surprisingly low generalization error, which, while consistent with our theory at the time, was not predicted by it.1 It is this challenge from the experiments of Breiman reported here, as well as those of Drucker and Cortes [3] and Quinlan [10], that prompted us to think harder about the problem and come up with a new theoretical explanation of the surprising behavior, which we describe in our paper with Bartlett and Lee [13]. This explanation suggests new measurable parameters, which can be tested in experiments, and the adventure continues! Theory suggests new algorithms and experiments, while experiments give rise to new observations which challenge the theory to come up with tighter bounds.2 Our communication with Leo has been challenging and exciting. We hope to see further communication developing between researchers in computational learning theory and statistics.

Yoav Freund | Robert E. Schapire | Y. Freund | R. Schapire

[1] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[2] Yoav Freund,et al. Boosting a weak learning algorithm by majority , 1995, COLT '90.

[3] Yoav Freund,et al. Boosting a weak learning algorithm by majority , 1990, COLT '90.

[4] Leslie G. Valiant,et al. Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[5] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .

[6] Thomas G. Dietterich,et al. Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[7] 金田重郎,et al. C4.5: Programs for Machine Learning (書評) , 1995 .

[8] Corinna Cortes,et al. Boosting Decision Trees , 1995, NIPS.

[9] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[10] J. Ross Quinlan,et al. Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[11] Robert Tibshirani,et al. Bias, Variance and Prediction Error for Classification Rules , 1996 .

[12] L. Breiman. Heuristics of instability and stabilization in model selection , 1996 .

[13] Ron Kohavi,et al. Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[14] Yoav Freund,et al. Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.