Leveraging Google Earth Engine (GEE) and machine learning algorithms to incorporate in situ measurement from different times for rangelands monitoring

Abstract Mapping and monitoring of indicators of soil cover, vegetation structure, and various native and non-native species is a critical aspect of rangeland management. With the advancement in satellite imagery as well as cloud storage and computing, the capability now exists to conduct planetary-scale analysis, including mapping of rangeland indicators. Combined with recent investments in the collection of large amounts of in situ data in the western U.S., new approaches using machine learning can enable prediction of surface conditions at times and places when no in situ data are available. However, little analysis has yet been done on how the temporal relevancy of training data influences model performance. Here, we have leveraged the Google Earth Engine (GEE) platform and a machine learning algorithm (Random Forest, after comparison with other candidates) to identify the potential impact of different sampling times (across months and years) on estimation of rangeland indicators from the Bureau of Land Management's (BLM) Assessment, Inventory, and Monitoring (AIM) and Landscape Monitoring Framework (LMF) programs. Our results indicate that temporally relevant training data improves predictions, though the training data need not be from the exact same month and year for a prediction to be temporally relevant. Moreover, inclusion of training data from the time when predictions are desired leads to lower prediction error but the addition of training data from other times does not contribute to overall model error. Using all of the available training data can lead to biases, toward the mean, for times when indicator values are especially high or low. However, for mapping purposes, limiting training data to just the time when predictions are desired can lead to poor predictions of values outside the spatial range of the training data for that period. We conclude that the best Random Forest prediction maps will use training data from all possible times with the understanding that estimates at the extremes will be biased.

[1]  J. Roujean,et al.  A bidirectional reflectance model of the Earth's surface for the correction of remote sensing data , 1992 .

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Jeffrey E. Herrick,et al.  Consistent Indicators and Methods and a Scalable Sample Design to Meet Assessment, Inventory, and Monitoring Information Needs Across Scales , 2011 .

[4]  Osvaldo E Sala,et al.  The Interactive Role of Wind and Water in Functioning of Drylands: What Does the Future Hold? , 2018, BioScience.

[5]  Greg Ridgeway,et al.  Generalized Boosted Models: A guide to the gbm package , 2006 .

[6]  Lijuan Liu,et al.  Comparative Analysis of Modeling Algorithms for Forest Aboveground Biomass Estimation in a Subtropical Region , 2018, Remote. Sens..

[7]  Michael Dixon,et al.  Google Earth Engine: Planetary-scale geospatial analysis for everyone , 2017 .

[8]  Françoise Guichard,et al.  Influence of dry‐season vegetation variability on Sahelian dust during 2002–2015 , 2017 .

[9]  A. Olsen,et al.  Spatially Balanced Sampling of Natural Resources , 2004 .

[10]  John Hogland,et al.  Mitigating the Impact of Field and Image Registration Errors through Spatial Aggregation , 2019, Remote. Sens..

[11]  G. Asner Biophysical and Biochemical Sources of Variability in Canopy Reflectance , 1998 .

[12]  David C. Hoaglin,et al.  Some Implementations of the Boxplot , 1989 .

[13]  Gregory S. Okin,et al.  Assimilating optical satellite remote sensing images and field data to predict surface indicators in the Western U.S.: Assessing error in satellite predictions based on large geographical datasets with the use of machine learning , 2019, Remote Sensing of Environment.

[14]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[15]  Limin Yang,et al.  Development of a 2001 National land-cover database for the United States , 2004 .

[16]  Roberta E. Martin,et al.  A Tale of Two “Forests”: Random Forest Machine Learning Aids Tropical Forest Carbon Mapping , 2014, PloS one.

[17]  C. Tucker Red and photographic infrared linear combinations for monitoring vegetation , 1979 .

[18]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[19]  Jason W. Karl,et al.  Integrating Remotely Sensed Imagery and Existing Multiscale Field Data to Derive Rangeland Indicators: Application of Bayesian Additive Regression Trees , 2017, Rangeland Ecology and Management.

[20]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[21]  K. O. Niemann,et al.  Simulated impact of sample plot size and co-registration error on the accuracy and uncertainty of LiDAR-derived estimates of forest stand biomass , 2011 .

[22]  G. Okin,et al.  The impact of atmospheric conditions and instrument noise on atmospheric correction and spectral mixture analysis of multispectral imagery , 2015 .

[23]  Galen Maclaurin,et al.  Temporal replication of the national land cover database using active machine learning , 2016 .

[24]  Christoph H. Lampert,et al.  Learning Equations for Extrapolation and Control , 2018, ICML.

[25]  Jason W. Karl,et al.  A comparison of cover calculation techniques for relating point-intercept vegetation sampling to remote sensing imagery ☆ , 2017 .

[26]  Gregory S. Okin,et al.  Desertification, land use, and the transformation of global drylands , 2015 .

[27]  Gordon R. Toevs,et al.  AIM-monitoring : a component of the BLM assessment, inventory, and monitoring strategy / , 2012 .

[28]  D. Barrett,et al.  Estimating fractional cover of photosynthetic vegetation, non-photosynthetic vegetation and bare soil in the Australian tropical savanna region upscaling the EO-1 Hyperion and MODIS sensors. , 2009 .

[29]  K. Price,et al.  Regional vegetation die-off in response to global-change-type drought. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Debra P. C. Peters,et al.  Synchronous species responses reveal phenological guilds: implications for management , 2018, Ecosphere.

[31]  Stephen I. Gallant,et al.  Perceptron-based learning algorithms , 1990, IEEE Trans. Neural Networks.

[32]  Michael P. Dombeck Thinking Like a Mountain: BLM's Approach to Ecosystem Management , 1996 .

[33]  Harry P. Bailey,et al.  THE MEAN ANNUAL RANGE AND STANDARD DEVIATION AS MEASURES OF DISPERSION OF TEMPERATURE AROUND THE ANNUAL MEAN , 1966 .

[34]  Chunlin Huang,et al.  A simplified data assimilation method for reconstructing time-series MODIS NDVI data , 2009 .

[35]  Martha C. Anderson,et al.  Landsat-8: Science and Product Vision for Terrestrial Global Change Research , 2014 .

[36]  Patrick L. Shaver,et al.  National ecosystem assessments supported by scientific and local knowledge , 2010 .

[37]  J. Omernik,et al.  Developing a Spatial Framework of Common Ecological Regions for the Conterminous United States , 2001, Environmental management.

[38]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[39]  Zhiqiang Yang,et al.  Implementation of the LandTrendr Algorithm on Google Earth Engine , 2018, Remote. Sens..

[40]  Jeremy D. Maestas,et al.  Innovation in rangeland monitoring: annual, 30 m, plant functional type percent cover maps for U.S. rangelands, 1984–2017 , 2018, Ecosphere.

[41]  T. Downing,et al.  Global Desertification: Building a Science for Dryland Development , 2007, Science.

[42]  C. Woodcock,et al.  Continuous change detection and classification of land cover using all available Landsat data , 2014 .

[43]  P. Thenkabail Global View of Remote Sensing of Rangelands: Evolution, Applications, Future Pathways , 2015 .

[44]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[45]  Mathew R. Schwaller,et al.  On the blending of the Landsat and MODIS surface reflectance: predicting daily Landsat surface reflectance , 2006, IEEE Transactions on Geoscience and Remote Sensing.

[46]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[47]  N. C. Strugnell,et al.  First operational BRDF, albedo nadir reflectance products from MODIS , 2002 .

[48]  Amir Hossein Alavi,et al.  Machine learning in geosciences and remote sensing , 2016 .

[49]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[50]  Robert E. Kennedy,et al.  A spatial ensemble approach for broad-area mapping of land surface properties , 2018, Remote Sensing of Environment.

[51]  Zhiqiang Yang,et al.  Detecting trends in forest disturbance and recovery using yearly Landsat time series: 1. LandTrendr — Temporal segmentation algorithms , 2010 .