论文信息 - An Introduction to Boosting and Leveraging

An Introduction to Boosting and Leveraging

We provide an introduction to theoretical and practical aspects of Boosting and Ensemble learning, providing a useful reference for researchers in the field of Boosting as well as for those seeking to enter this fascinating area of research. We begin with a short background concerning the necessary learning theoretical foundations of weak learners and their linear combinations. We then point out the useful connection between Boosting and the Theory of Optimization, which facilitates the understanding of Boosting and later on enables us to move on to new Boosting algorithms, applicable to a broad spectrum of problems. In order to increase the relevance of the paper to practitioners, we have added remarks, pseudo code, "tricks of the trade", and algorithmic considerations where appropriate. Finally, we illustrate the usefulness of Boosting algorithms by giving an overview of some existing applications. The main ideas are illustrated on the problem of binary classification, although several extensions are discussed.

Gunnar Rätsch | Ron Meir | R. Meir | G. Rätsch

[1] J. Mercer. Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[2] J. Neumann. Zur Theorie der Gesellschaftsspiele , 1928 .

[3] Feller William,et al. An Introduction To Probability Theory And Its Applications , 1950 .

[4] O. Mangasarian. Linear and Nonlinear Separation of Patterns by Linear Programming , 1965 .

[5] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[6] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[7] William Feller,et al. An Introduction to Probability Theory and Its Applications , 1967 .

[8] G. Wahba,et al. Some results on Tchebycheffian spline functions , 1971 .

[9] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[10] E. C. Macrae. Matrix Derivatives with an Application to an Adaptive Linear Decision Problem , 1974 .

[11] H. Akaike. A new look at the statistical model identification , 1974 .

[12] A. N. Tikhonov,et al. Solutions of ill-posed problems , 1977 .

[13] Franco P. Preparata,et al. The Densest Hemisphere Problem , 1978, Theor. Comput. Sci..

[14] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..

[15] Temple F. Smith. Occam's razor , 1980, Nature.

[16] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[17] R. Tibshirani,et al. Generalized additive models for medical research , 1986, Statistical methods in medical research.

[18] Yoav Freund,et al. Boosting a weak learning algorithm by majority , 1995, COLT '90.

[19] T Poggio,et al. Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[20] D. Cox,et al. Asymptotic Analysis of Penalized Likelihood and Related Estimators , 1990 .

[21] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[22] Yoav Freund,et al. Boosting a weak learning algorithm by majority , 1990, COLT '90.

[23] John E. Moody,et al. The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[24] Philip M. Long,et al. On-line learning of linear functions , 1991, STOC '91.

[25] G. W. Hart,et al. Nonintrusive appliance load monitoring , 1992, Proc. IEEE.

[26] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[27] O. Mangasarian,et al. Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[28] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[29] C. Stein,et al. Estimation with Quadratic Loss , 1992 .

[30] P. Tseng,et al. On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[31] David Haussler,et al. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[32] Stéphane Mallat,et al. Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[33] Harris Drucker,et al. Boosting Performance in Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[34] Kenneth O. Kortanek,et al. Semi-Infinite Programming: Theory, Methods, and Applications , 1993, SIAM Rev..

[35] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[36] Allan Pinkus,et al. Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[37] Leslie G. Valiant,et al. Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[38] Robert A. Jacobs,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[39] S. Hyakin,et al. Neural Networks: A Comprehensive Foundation , 1994 .

[40] Manfred K. Warmuth,et al. Bounds on approximate steepest descent for likelihood maximization in exponential families , 1994, IEEE Trans. Inf. Theory.

[41] J. Dussault,et al. Stable exponential-penalty algorithm with superlinear convergence , 1994 .

[42] Harris Drucker,et al. Boosting and Other Ensemble Methods , 1994, Neural Computation.

[43] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .

[44] Umesh V. Vazirani,et al. An Introduction to Computational Learning Theory , 1994 .

[45] O. Mangasarian,et al. Multicategory discrimination via linear programming , 1994 .

[46] Shun-ichi Amari,et al. Network information criterion-determining the number of hidden units for an artificial neural network model , 1994, IEEE Trans. Neural Networks.

[47] Manfred K. Warmuth,et al. Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[48] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[49] Thomas G. Dietterich,et al. Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[50] Harris Drucker,et al. Comparison of learning algorithms for handwritten digit recognition , 1995 .

[51] Manfred K. Warmuth,et al. The perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant , 1995, COLT '95.

[52] Yishay Mansour,et al. On the boosting ability of top-down decision tree learning algorithms , 1996, STOC '96.

[53] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[54] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[55] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[56] C. Robert. The Bayesian choice : a decision-theoretic motivation , 1996 .

[57] J. Ross Quinlan,et al. Boosting First-Order Learning , 1996, ALT.

[58] Yoav Freund,et al. Game theory, on-line prediction and boosting , 1996, COLT '96.

[59] Leo Breiman,et al. Bias, Variance , And Arcing Classifiers , 1996 .

[60] Yoram Singer,et al. Learning to Order Things , 1997, NIPS.

[61] Manfred K. Warmuth,et al. The Perceptron Algorithm Versus Winnow: Linear Versus Logarithmic Mistake Bounds when Few Input Variables are Relevant (Technical Note) , 1997, Artif. Intell..

[62] Yoav Freund,et al. Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[63] Paola Campadelli,et al. A Boosting Algorithm for Regression , 1997, ICANN.

[64] Y. Censor,et al. Parallel Optimization: Theory, Algorithms, and Applications , 1997 .

[65] Harris Drucker,et al. Improving Regressors using Boosting Techniques , 1997, ICML.

[66] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[67] Heinz H. Bauschke,et al. Legendre functions and the method of random Bregman projections , 1997 .

[68] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[69] Robert E. Schapire,et al. Using output codes to boost multiclass learning problems , 1997, ICML.

[70] John Shawe-Taylor,et al. Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[71] Paul S. Bradley,et al. Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[72] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[73] Dale Schuurmans,et al. Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[74] Yoram Singer,et al. An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[75] Yoram Singer,et al. Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[76] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[77] Alexander J. Smola,et al. Learning with kernels , 1998 .

[78] K. Kiwiel. Relaxation Methods for Strictly Convex Regularizations of Piecewise Linear Programs , 1998 .

[79] Yoram Singer,et al. Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[80] Gunnar Rätsch,et al. An asymptotic analysis of AdaBoost in the binary classification case , 1998 .

[81] Y. Freund,et al. Adaptive game playing using multiplicative weights , 1999 .

[82] Robert E. Schapire,et al. A Brief Introduction to Boosting , 1999, IJCAI.

[83] R. C. Williamson,et al. Classification on proximity data with LP-machines , 1999 .

[84] Olvi L. Mangasarian,et al. Arbitrary-norm separating plane , 1999, Oper. Res. Lett..

[85] Yoram Singer,et al. Boosting Applied to Tagging and PP Attachment , 1999, EMNLP.

[86] Nello Cristianini,et al. Further results on the margin distribution , 1999, COLT '99.

[87] Peter L. Bartlett,et al. Learning in Neural Networks: Theoretical Foundations , 1999 .

[88] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .

[89] Venkatesan Guruswami,et al. Multiclass learning, boosting, and error-correcting codes , 1999, COLT '99.

[90] Yishay Mansour,et al. On the Boosting Ability of Top-Down Decision Tree Learning Algorithms , 1999, J. Comput. Syst. Sci..

[91] Leo Breiman,et al. Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[92] J. Lafferty. Additive models, boosting, and inference for generalized divergences , 1999, COLT '99.

[93] Yoav Freund,et al. An Adaptive Version of the Boost by Majority Algorithm , 1999, COLT '99.

[94] Manfred K. Warmuth,et al. Boosting as entropy projection , 1999, COLT '99.

[95] David P. Helmbold,et al. Potential Boosters? , 1999, NIPS.

[96] Yoav Freund,et al. A Short Introduction to Boosting , 1999 .

[97] Thomas Richardson,et al. Boosting methodology for regression problems , 1999, AISTATS.

[98] Yoram Singer,et al. Leveraged Vector Machines , 1999, NIPS.

[99] David P. Helmbold,et al. A geometric approach to leveraging weak learners , 1999, Theor. Comput. Sci..

[100] Manfred K. Warmuth,et al. Relative loss bounds for single neurons , 1999, IEEE Trans. Neural Networks.

[101] Llew Mason,et al. Margins and combined classifiers , 1999 .

[102] Tsuhan Chen,et al. Pose invariant face recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[103] L. Breiman. SOME INFINITY THEORY FOR PREDICTOR ENSEMBLES , 2000 .

[104] Gunnar Rätsch,et al. Barrier Boosting , 2000, COLT.

[105] Javed A. Aslam. Improving Algorithms for Boosting , 2000, COLT.

[106] Gunnar Rätsch,et al. Robust Ensemble Learning , 2000 .

[107] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[108] Ran El-Yaniv,et al. Localized Boosting , 2000, COLT.

[109] J. Langford,et al. FeatureBoost: A Meta-Learning Algorithm that Improves Model Robustness , 2000, ICML.

[110] J. Friedman. Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[111] John Shawe-Taylor,et al. Towards a strategy for boosting regressors , 2000 .

[112] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[113] John Shawe-Taylor,et al. Sparsity vs. Large Margins for Linear Classifiers , 2000, COLT.

[114] Toniann Pitassi,et al. A Gradient-Based Boosting Algorithm for Regression Problems , 2000, NIPS.