BOOSTED TREES FOR ECOLOGICAL MODELING AND PREDICTION

Accurate prediction and explanation are fundamental objectives of statistical analysis, yet they seldom coincide. Boosted trees are a statistical learning method that attains both of these objectives for regression and classification analyses. They can deal with many types of response variables (numeric, categorical, and censored), loss functions (Gaussian, binomial, Poisson, and robust), and predictors (numeric, categorical). Interactions between predictors can also be quantified and visualized. The theory underpinning boosted trees is presented, together with interpretive techniques. A new form of boosted trees, namely, ''aggregated boosted trees'' (ABT), is proposed and, in a simulation study, is shown to reduce prediction error relative to boosted trees. A regression data set is analyzed using ABT to illustrate the technique and to compare it with other methods, including boosted trees, bagged trees, random forests, and generalized additive models. A software package for ABT analysis using the R software environment is included in the Appendices together with worked examples.

[1]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[2]  Glenn De'ath,et al.  Classification and regression trees: a powerful yet simple technique for the analysis of complex ecological data , 2000 .

[3]  H. Akaike A new look at the statistical model identification , 1974 .

[4]  Glenn De'ath,et al.  Multivariate Regression Trees: A new technique for constrained classification analysis , 2002 .

[5]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[6]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[7]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[8]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[9]  Douglas H. Johnson The Insignificance of Statistical Significance Testing , 1999 .

[10]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[11]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[12]  T. Hastie,et al.  Variation in demersal fish species richness in the oceans surrounding New Zealand: an analysis using boosted regression trees , 2006 .

[13]  David R. Anderson,et al.  Model selection and inference : a practical information-theoretic approach , 2000 .

[14]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[15]  G. De’ath,et al.  Development of a robust classifier of freshwater residence in barramundi (Lates calcarifer) life histories using elemental ratios in scales and boosted regression trees , 2005 .

[16]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[17]  J. Friedman Stochastic gradient boosting , 2002 .

[18]  R. Plant,et al.  Classification trees: An alternative non‐parametric approach for predicting species distributions , 2000 .

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[21]  B. D. Ripley,et al.  SELECTING AMONGST LARGE CLASSES OF MODELS , 2004 .