Approximating Prediction Uncertainty for Random Forest Regression Models

Abstract Machine learning approaches such as random forest have increased for the spatial modeling and mapping of continuous variables. Random forest is a non-parametric ensemble approach, and unlike traditional regression approaches there is no direct quantification of prediction error. Understanding prediction uncertainty is important when using model-based continuous maps as inputs to other modeling applications such as fire modeling. Here we use a Monte Carlo approach to quantify prediction uncertainty for random forest regression models. We test the approach by simulating maps of dependent and independent variables with known characteristics and comparing actual errors with prediction errors. Our approach produced conservative prediction intervals across most of the range of predicted values. However, because the Monte Carlo approach was data driven, prediction intervals were either too wide or too narrow in sparse parts of the prediction distribution. Overall, our approach provides reasonable estimates of prediction uncertainty for random forest regression models.

[1]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[2]  C. Woodcock,et al.  Scaling Field Data to Calibrate and Validate Moderate Spatial Resolution Remote Sensing Models , 2007 .

[3]  Warren B. Cohen,et al.  Choosing appropriate subpopulations for modeling tree canopy cover nationwide , 2012 .

[4]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[5]  C. Woodall,et al.  Imputing forest carbon stock estimates from inventory plots to a nationally continuous coverage , 2013, Carbon Balance and Management.

[6]  Massimiliano Pittore,et al.  Performance Evaluation of Machine Learning Algorithms for Urban Pattern Recognition from Multi-spectral Satellite Images , 2014, Remote. Sens..

[7]  Kenneth B. Pierce,et al.  Quantification of live aboveground forest biomass dynamics with Landsat time-series and field inventory data: A comparison of empirical modeling approaches , 2010 .

[8]  Guangxing Wang,et al.  A Methodology for Spatial Uncertainty Analysis Of Remote Sensing and GIS Products , 2005 .

[9]  Gretchen G. Moisen,et al.  Comparing five modelling techniques for predicting forest characteristics , 2002 .

[10]  M. D. Nelson,et al.  Mapping U.S. forest biomass using nationwide forest inventory data and moderate resolution information , 2008 .

[11]  Richard Fernandes,et al.  Evaluating image-based estimates of leaf area index in boreal conifer stands over a range of scales using high-resolution CASI imagery , 2004 .

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  M. Rollins,et al.  The LANDFIRE prototype project: Nationally consistent and locally relevant geospatial data for wildland fire management , 2006 .

[14]  Jennifer L. Dungan,et al.  Modeling and visualizing uncertainty in continuous variables predicted using remotely sensed data , 2003, IGARSS 2003. 2003 IEEE International Geoscience and Remote Sensing Symposium. Proceedings (IEEE Cat. No.03CH37477).

[15]  J. Wickham,et al.  Completion of the 2001 National Land Cover Database for the conterminous United States , 2007 .

[16]  P. Hernandez,et al.  Predicting species distributions in poorly-studied landscapes , 2008, Biodiversity and Conservation.

[17]  Russell Congalton,et al.  Assessing the Accuracy of Remotely Sensed Data: Principles and Practices, Second Edition , 1998 .

[18]  S. Goetz,et al.  Reply to Comment on ‘A first map of tropical Africa’s above-ground biomass derived from satellite imagery’ , 2008, Environmental Research Letters.

[19]  Matthew J. Cracknell,et al.  Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information , 2014, Comput. Geosci..

[20]  Randolph H. Wynne,et al.  Fusion of Small-Footprint Lidar and Multispectral Data to Estimate Plot- Level Volume and Biomass in Deciduous and Pine Forests in Virginia, USA , 2004, Forest Science.

[21]  P. Atkinson,et al.  Uncertainty in remote sensing and GIS , 2002 .

[22]  Christopher A. Barnes,et al.  Completion of the 2006 National Land Cover Database for the conterminous United States. , 2011 .

[23]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[24]  Limin Yang,et al.  A STRATEGY FOR ESTIMATING TREE CANOPY DENSITY USING LANDSAT 7 ETM+ AND HIGH RESOLUTION IMAGES OVER LARGE AREAS , 2001 .

[25]  Norman R. Draper,et al.  Applied regression analysis (2. ed.) , 1981, Wiley series in probability and mathematical statistics.

[26]  S. Weisberg Applied Linear Regression , 1981 .

[27]  Michael Edward Hohn,et al.  An Introduction to Applied Geostatistics: by Edward H. Isaaks and R. Mohan Srivastava, 1989, Oxford University Press, New York, 561 p., ISBN 0-19-505012-6, ISBN 0-19-505013-4 (paperback), $55.00 cloth, $35.00 paper (US) , 1991 .

[28]  Limin Yang,et al.  Development of a 2001 National land-cover database for the United States , 2004 .

[29]  J. Evans,et al.  Gradient modeling of conifer species using random forests , 2009, Landscape Ecology.

[30]  Warren B. Cohen,et al.  Modeling Percent Tree Canopy Cover: A Pilot Study , 2012 .