Utah State University From the SelectedWorks of

Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature.

[1]  Brian M. Steele,et al.  Combining Multiple Classifiers: An Application Using Spatial and Remotely Sensed Information for Land Cover Type Mapping , 2000 .

[2]  G. De’ath,et al.  CLASSIFICATION AND REGRESSION TREES: A POWERFUL YET SIMPLE TECHNIQUE FOR ECOLOGICAL DATA ANALYSIS , 2000 .

[3]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[4]  J. Belnap,et al.  Roads as Conduits for Exotic Plant Invasions in a Semiarid Landscape , 2003 .

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[7]  Thomas C. Edwards,et al.  Landscape patterns as habitat predictors: building and testing models for cavity-nesting birds in the Uinta Mountains of Utah, USA , 2002, Landscape Ecology.

[8]  A. Prasad,et al.  Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction , 2006, Ecosystems.

[9]  W. Thuiller,et al.  Predicting species distribution: offering more than simple habitat models. , 2005, Ecology letters.

[10]  D. R. Cutler,et al.  MODEL-BASED STRATIFICATIONS FOR ENHANCING THE DETECTION OF RARE ECOLOGICAL EVENTS , 2005 .

[11]  Adele Cutler,et al.  Random forests for microarrays. , 2006, Methods in enzymology.

[12]  D. R. Cutler,et al.  Effects of sample survey design on the accuracy of classification tree models in species distribution models , 2006 .

[13]  D. White,et al.  Predicting climate‐induced range shifts: model differences and model reliability , 2006 .

[14]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .