Assimilating optical satellite remote sensing images and field data to predict surface indicators in the Western U.S.: Assessing error in satellite predictions based on large geographical datasets with the use of machine learning

Abstract Indicators of vegetation composition, vegetation structure, bare ground cover, and gap size in drylands potentially gives information about the condition of ecosystems, in part because they are strongly related to factors such as erosion, wildlife habitat characteristics, and the suitability for some land uses. Field data collection based on points does not produce spatially continuous information about surface indicators and cannot cover vast geographic areas. Remote sensing is possibly a labor- and time-saving method to estimate important biophysical indicators of vegetation and surface condition at both temporal and spatial scales impossible with field methods. Regression models based on machine learning algorithms, such as random forest (RF), can build relationships between field and remotely sensed data, while also providing error estimates. In this study, field data including over 15,000 points from the Assessment, Inventory, and Monitoring (AIM) and Landscape Monitoring Framework (LMF) programs on Bureau of Land Management (BLM) lands throughout the Western U.S., Moderate Resolution Imaging Spectroradiometer (MODIS) bidirectional reflectance distribution function (BRDF) parameters, MODIS nadir BRDF-adjusted reflectance (NBAR), and Landsat 8 Operational Land Imager (OLI) surface reflectance products with ancillary data were used as predictor variables in a k-fold cross-validation approach to RF modeling. RF regression models were built to predict fourteen indicators of vegetation cover and height, as well as bare gap parameters. The RF model estimates exhibited good correlations with independent samples, with a low bias and a low RMSE. External cross-validation showed good agreement with out-of-bag (OOB) errors produced by RF and also allowed mapping prediction uncertainty. Predicted distribution maps of the surface indicators were produced by using these relationships across the arid and semiarid Western U.S. The bias and RMSE distribution maps show that the sample insufficiency and unevenly pattern of sample strongly impact the accuracy of the RF regression and prediction. The results from this study clearly show the utility of RF as a means to estimate multiple dryland surface indicators from remotely sensed data, and the reliability of the OOB errors in assessing the accuracy of the predictions.

[1]  T. Swetnam,et al.  Warming and Earlier Spring Increase Western U.S. Forest Wildfire Activity , 2006, Science.

[2]  Bruce E. Gorham,et al.  Using digital photographs and object-based image analysis to estimate percent ground cover in vegetation plots , 2006 .

[3]  Martha C. Anderson,et al.  Landsat-8: Science and Product Vision for Terrestrial Global Change Research , 2014 .

[4]  Jason W. Karl,et al.  Integrating Remotely Sensed Imagery and Existing Multiscale Field Data to Derive Rangeland Indicators: Application of Bayesian Additive Regression Trees , 2017, Rangeland Ecology and Management.

[5]  Mark R. Segal,et al.  Machine Learning Benchmarks and Random Forest Regression , 2004 .

[6]  Mahesh Pal,et al.  Random forest classifier for remote sensing classification , 2005 .

[7]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[8]  Carol S. Spurrier,et al.  BLM core terrestrial indicators and methods , 2011 .

[9]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[10]  C. Elzinga,et al.  Measuring & Monitering Plant Populations , 1998 .

[11]  Patrick L. Shaver,et al.  National ecosystem assessments supported by scientific and local knowledge , 2010 .

[12]  Jindi Wang,et al.  Advanced remote sensing : terrestrial information extraction and applications , 2012 .

[13]  Mario Chica-Olmo,et al.  An assessment of the effectiveness of a random forest classifier for land-cover classification , 2012 .

[14]  Carsten F. Dormann,et al.  Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure , 2017 .

[15]  Miina Rautiainen,et al.  Retrieval of seasonal dynamics of forest understory reflectance in a Northern European boreal forest from MODIS BRDF data , 2012 .

[16]  A. Strahler,et al.  Global clumping index map derived from the MODIS BRDF product , 2012 .

[17]  Lingli Wang,et al.  Evaluation of similarity measure methods for hyperspectral remote sensing data , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[18]  Peng Gong,et al.  Foliage Clumping Index Over China's Landmass Retrieved From the MODIS BRDF Parameters Product , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[19]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[20]  Xiaowen Li,et al.  An Anisotropic Flat Index (AFX) to derive BRDF archetypes from MODIS , 2014 .

[21]  Development of a MODIS-Derived Surface Albedo Data Set: An Improved Model Input for Processing the NSRDB , 2016 .

[22]  Soil Property and Class Maps of the Conterminous US at 100 meter Spatial Resolution based on a Compilation of National Soil Point Observations and Machine Learning , 2017, 1705.08323.

[23]  A. Rango,et al.  Object-oriented image analysis for mapping shrub encroachment from 1937 to 2003 in southern New Mexico , 2004 .

[24]  N. C. Strugnell,et al.  First operational BRDF, albedo nadir reflectance products from MODIS , 2002 .

[25]  D. Peters,et al.  Do Changes in Connectivity Explain Desertification? , 2009 .

[26]  Gregory S. Okin,et al.  Desertification, land use, and the transformation of global drylands , 2015 .

[27]  J. Ludwig,et al.  Leakiness: A new index for monitoring the health of arid and semiarid landscapes using remotely sensed vegetation cover and elevation data , 2007 .

[28]  C. Brodley,et al.  Decision tree classification of land cover from remotely sensed data , 1997 .

[29]  Ahmed M. Soliman,et al.  Tilt and azimuth angles in solar energy applications – A review , 2017 .

[30]  Amir Hossein Alavi,et al.  Machine learning in geosciences and remote sensing , 2016 .

[31]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[32]  David J. Lary,et al.  Artificial Intelligence in Geoscience and Remote Sensing , 2010 .

[33]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[34]  Marvin N. Wright,et al.  SoilGrids250m: Global gridded soil information based on machine learning , 2017, PloS one.

[35]  Jesslyn F. Brown,et al.  Development of a land-cover characteristics database for the conterminous U.S. , 1991 .

[36]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[37]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[38]  F. Gao,et al.  Detecting vegetation structure using a kernel-based BRDF model , 2003 .

[39]  J. Omernik,et al.  Developing a Spatial Framework of Common Ecological Regions for the Conterminous United States , 2001, Environmental management.

[40]  D. Steinberg CART: Classification and Regression Trees , 2009 .

[41]  Jason W. Karl,et al.  A double-sampling approach to deriving training and validation data for remotely-sensed vegetation products , 2014 .

[42]  J. M. Smith Vegetation and Microclimate of East- and West-Facing Slopes in the Grasslands of MT Wilhelm, Papua New Guinea , 1977 .

[43]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[44]  J. Roujean,et al.  A bidirectional reflectance model of the Earth's surface for the correction of remote sensing data , 1992 .

[45]  Jan-Berend W Stuut,et al.  Mineral Dust: A Key Player in the Earth System , 2014 .

[46]  D Terrance Booth,et al.  Dual-camera, high-resolution aerial assessment of pipeline revegetation , 2009, Environmental monitoring and assessment.

[47]  C. D. Vojta,et al.  Strategies for Monitoring Terrestrial Animals and Habitats , 2012 .

[48]  Laurent Heutte,et al.  Dynamic Random Forests , 2012, Pattern Recognit. Lett..

[49]  Hiroyuki Fujisada,et al.  ASTER DEM performance , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[50]  A. Strahler,et al.  Geometric-Optical Bidirectional Reflectance Modeling of a Conifer Forest Canopy , 1986, IEEE Transactions on Geoscience and Remote Sensing.

[51]  Douglas A. Miller,et al.  A Conterminous United States Multilayer Soil Characteristics Dataset for Regional Climate and Hydrology Modeling , 1998 .

[52]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[53]  Yongkang Xue,et al.  The Influence of Land Surface Properties on Sahel Climate. Part 1: Desertification , 1993 .

[54]  S. Schrader,et al.  Rangeland and pasture monitoring: an approach to interpretation of high-resolution imagery focused on observer calibration for repeatability , 2012, Environmental Monitoring and Assessment.

[55]  Jeremy D. Maestas,et al.  Innovation in rangeland monitoring: annual, 30 m, plant functional type percent cover maps for U.S. rangelands, 1984–2017 , 2018, Ecosphere.

[56]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[57]  J. Qi,et al.  Remote Sensing for Grassland Management in the Arid Southwest , 2006 .

[58]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[59]  Hongliang Fang,et al.  Mapping plant functional types from MODIS data using multisource evidential reasoning , 2008 .

[60]  M. Duniway,et al.  A Technique for Estimating Rangeland Canopy-Gap Size Distributions From High-Resolution Digital Imagery , 2012 .

[61]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..