Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction

The task of modeling the distribution of a large number of tree species under future climate scenarios presents unique challenges. First, the model must be robust enough to handle climate data outside the current range without producing unacceptable instability in the output. In addition, the technique should have automatic search mechanisms built in to select the most appropriate values for input model parameters for each species so that minimal effort is required when these parameters are fine-tuned for individual tree species. We evaluated four statistical models—Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS)—for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model. To test, we applied these techniques to four tree species common in the eastern United States: loblolly pine (Pinus taeda), sugar maple (Acer saccharum), American beech (Fagus grandifolia), and white oak (Quercus alba). When the four techniques were assessed with Kappa and fuzzy Kappa statistics, RF and BT were superior in reproducing current importance value (a measure of basal area in addition to abundance) distributions for the four tree species, as derived from approximately 100,000 USDA Forest Service’s Forest Inventory and Analysis plots. Future estimates of suitable habitat after climate change were visually more reasonable with BT and RF, with slightly better performance by RF as assessed by Kappa statistics, correlation estimates, and spatial distribution of importance values. Although RTA did not perform as well as BT and RF, it provided interpretive models for species whose distributions were captured well by our current set of predictors. MARS was adequate for predicting current distributions but unacceptable for future climate. We consider RTA, BT, and RF modeling approaches, especially when used together to take advantage of their individual strengths, to be robust for predictive mapping and recommend their inclusion in the ecological toolbox.

[1]  E. L. Little Conifers and important hardwoods , 1971 .

[2]  E. L. Little Atlas of United States trees. , 1971 .

[3]  E. L. Little Minor eastern hardwoods , 1977 .

[4]  David L. Verbyla,et al.  Classification trees: a new discrimination tool , 1987 .

[5]  M. B. Davis,et al.  Lags in vegetation response to greenhouse warming , 1989 .

[6]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[7]  Brian G. Lees,et al.  Decision-tree and rule-induction approach to integration of remotely sensed and GIS data in mapping vegetation in disturbed or hilly environments , 1991 .

[8]  J. Friedman Multivariate adaptive regression splines , 1990 .

[9]  B. Lees,et al.  A new method for predicting vegetation distributions using decision tree analysis in a geographic information system , 1991 .

[10]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[11]  Trevor Hastie,et al.  Statistical Models in S , 1991 .

[12]  Mark H. Hansen,et al.  The Eastwide forest inventory data base: users manual. , 1992 .

[13]  Daryl Pregibon,et al.  Tree-based models , 1992 .

[14]  R. Leemans,et al.  Comparing global vegetation maps with the Kappa statistic , 1992 .

[15]  David L. Verbyla,et al.  Classification and Regression Tree Analysis for Assessing Hazard of Pine Mortality Caused by Heterobasidion annosum , 1993 .

[16]  Richard J. Hobbs,et al.  Dynamics of vegetation mosaics: Can we predict responses to global change? , 1994 .

[17]  J. Michaelsen,et al.  Regression Tree Analysis of satellite and terrain data to guide vegetation sampling and surveys , 1994 .

[18]  J. Franklin Predictive vegetation mapping: geographic modelling of biospatial patterns in relation to environmental gradients , 1995 .

[19]  Y. Freund Boosting a Weak Learning Algorithm by Majority to Be Published in Information and Computation , 1995 .

[20]  R. DeFries,et al.  Classification trees: an alternative to traditional land cover classifiers , 1996 .

[21]  L. Breiman OUT-OF-BAG ESTIMATION , 1996 .

[22]  Sarah H. Reichard,et al.  Predicting Invasions of Woody Plants Introduced into North America , 1997, Conservation Biology.

[23]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[24]  L. Epstein,et al.  Use of generalized regression tree models to characterize vegetation favoring Anopheles albimanus breeding. , 1997, Journal of the American Mosquito Control Association.

[25]  James S. Clark,et al.  Plant migration and climate change , 1997 .

[26]  Gregory S. Biging,et al.  Using the non-parametric classifier CART to model forest tree mortality , 1998 .

[27]  James S. Clark,et al.  Why Trees Migrate So Fast: Confronting Theory with Dispersal Biology and the Paleorecord , 1998, The American Naturalist.

[28]  A. Prasad,et al.  PREDICTING ABUNDANCE OF 80 TREE SPECIES FOLLOWING CLIMATE CHANGE IN THE EASTERN UNITED STATES , 1998 .

[29]  J. Chambers Programming with Data: A Guide to the S Language , 1998 .

[30]  J. Franklin Predicting the distribution of shrub species in southern California from climate and terrain‐derived variables , 1998 .

[31]  A. Prasad,et al.  Atlas of current and potential future distributions of common trees of the eastern United States , 1999 .

[32]  Mark W. Schwartz,et al.  Modeling potential future individual tree-species distributions in the eastern United States under a climate change scenario: a case study with Pinus virginiana , 1999 .

[33]  Douglas M. Bates,et al.  Programming With Data: A Guide to the S Language , 1999, Technometrics.

[34]  G. Boer,et al.  A transient climate change simulation with greenhouse gas and aerosol forcing: projected climate to the twenty-first century , 2000 .

[35]  G. De’ath,et al.  CLASSIFICATION AND REGRESSION TREES: A POWERFUL YET SIMPLE TECHNIQUE FOR ECOLOGICAL DATA ANALYSIS , 2000 .

[36]  R. Pontius QUANTIFICATION ERROR VERSUS LOCATION ERROR IN COMPARISON OF CATEGORICAL MAPS , 2000 .

[37]  Chengquan Huang,et al.  Enhanced algorithm performance for land cover classification from remotely sensed data using bagging and boosting , 2001, IEEE Trans. Geosci. Remote. Sens..

[38]  Mark W. Schwartz,et al.  Predicting the Potential Future Distribution of Four Tree Species in Ohio Using Current Habitat Availability and Climatic Forcing , 2001, Ecosystems.

[39]  Roger White,et al.  Hierarchical fuzzy pattern matching for the regional comparison of land use maps , 2001, Int. J. Geogr. Inf. Sci..

[40]  Ajith Abraham,et al.  MARS: Still an Alien Planet in Soft Computing? , 2001, International Conference on Computational Science.

[41]  P. Bühlmann,et al.  Analyzing Bagging , 2001 .

[42]  A Hagen Comparison of maps containing nominal data , 2002 .

[43]  Louis R. Iverson,et al.  Potential redistribution of tree species habitat under five climate change scenarios in the eastern US , 2002 .

[44]  R. Neilson,et al.  Estimated migration rates under scenarios of global climate change , 2002 .

[45]  Robert P. W. Duin,et al.  Bagging, Boosting and the Random Subspace Method for Linear Classifiers , 2002, Pattern Analysis & Applications.

[46]  Gretchen G. Moisen,et al.  Comparing five modelling techniques for predicting forest characteristics , 2002 .

[47]  José M. C. Pereira,et al.  The use of SPOT VEGETATION data in a classification tree approach for burnt area mapping in Australian savanna , 2003 .

[48]  Steven I. Higgins,et al.  Estimating plant migration rates under habitat loss and fragmentation , 2003 .

[49]  Cesare Furlanello,et al.  GIS and the Random Forest Predictor: Integration in R for Tick-Borne Disease Risk Assessment , 2003 .

[50]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[51]  Kurt Hornik,et al.  The support vector machine under test , 2003, Neurocomputing.

[52]  Alex Hagen,et al.  Fuzzy set approach to assessing similarity of categorical maps , 2003, Int. J. Geogr. Inf. Sci..

[53]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[54]  JAMES R. MILLER,et al.  Spatial Extrapolation: The Science of Predicting Ecological Patterns and Processes , 2004 .

[55]  Torsten Hothorn,et al.  Bagging survival trees , 2002, Statistics in medicine.

[56]  Jesús Muñoz,et al.  Comparison of statistical methods commonly used in predictive modelling , 2004 .

[57]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[58]  R. G. Pontlus Quantification Error Versus Location Error in Comparison of Categorical Maps , 2006 .

[59]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .