Using phenology-based enhanced vegetation index and machine learning for soybean yield estimation in Paraná State, Brazil

Abstract. Accurate and timely regional estimates of agricultural production are key for decision makers. This study aims to understand how different machine learning techniques impact soybean yield estimation in extracting maximum information from remotely sensed MODIS enhanced vegetation index (EVI) that is constrained by phenology. Specifically, a methodology is developed for incorporating phenological information aligned with EVI acquisition for each pixel and selecting the most significant predictors out of 36 predictors using feature selection. These predictors were then used in four machine learning algorithms (MLA) to obtain soybean yield estimates for observed farms in the Paraná State, Brazil. The optimal MLA was then implemented for the whole state to obtain regional soybean yield. The gradient boosting model (GBM) with all 36 predictors performed well with a mean difference of 3.5  kg ha  −  1, an RMSD of 373  kg ha  −  1, and Willmott’s d of 0.85, however, the random forest (RF) algorithm using five optimal EVI predictors presented similar results, but with considerably less computational time. Both GBM and RF provided higher regional yields compared to the officially reported yields by 1775  ×  103 and 2059  ×  103  metric tons, respectively. The RF with five EVI predictors provided the best results for regional soybean estimations, considering the accuracy and computational performances.

[1]  Erivelto Mercante,et al.  Mapping soya bean and corn crops in the State of Paraná, Brazil, using EVI images from the MODIS sensor , 2016 .

[2]  D. Arvor,et al.  Spectral Model for Soybean Yield Estimate Using MODIS/EVI Data , 2013 .

[3]  Jerry Adriani Johann,et al.  Uso de imagens do sensor orbital modis na estimação de datas do ciclo de desenvolvimento da cultura da soja para o estado do Paraná – Brasil , 2016 .

[4]  Jiahua Zhang,et al.  Estimation of Rice Yield with a Process-Based Model and Remote Sensing Data in the Middle and Lower Reaches of Yangtze River of China , 2017, Journal of the Indian Society of Remote Sensing.

[5]  Jonathan P. Resop,et al.  Random Forests for Global and Regional Crop Yield Predictions , 2016, PloS one.

[6]  C. Willmott ON THE VALIDATION OF MODELS , 1981 .

[7]  R. Lamparelli,et al.  Mapping and discrimination of soya bean and corn crops using spectro-temporal profiles of vegetation indices , 2015 .

[8]  L. Aparecido,et al.  Köppen, Thornthwaite and Camargo climate classifications for climatic zoning in the State of Paraná, Brazil , 2016 .

[9]  Damien Arvor,et al.  Model for soybean production forecast based on prevailing physical conditions , 2017 .

[10]  Claire Marais-Sicre,et al.  Estimation of corn yield using multi-temporal optical and radar satellite data and artificial neural networks , 2017, Int. J. Appl. Earth Obs. Geoinformation.

[11]  J. Friedman Stochastic gradient boosting , 2002 .

[12]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[13]  He Li,et al.  Improving Winter Wheat Yield Estimation from the CERES-Wheat Model to Assimilate Leaf Area Index with Different Assimilation Methods and Spatio-Temporal Scales , 2017, Remote. Sens..

[14]  Rémy Fieuzal,et al.  Assimilation of LAI and Dry Biomass Data From Optical and SAR Images Into an Agro-Meteorological Model to Estimate Soybean Yield , 2016, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[15]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[16]  Mehrez Zribi,et al.  Integration of remote sensing derived parameters in crop models: Application to the PILOTE model for hay production , 2016 .

[17]  Daniela M. Witten,et al.  An Introduction to Statistical Learning: with Applications in R , 2013 .

[18]  Douglas K. Bolton,et al.  Forecasting crop yield using remotely sensed vegetation indices and crop phenology metrics , 2013 .

[19]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[20]  Lorenzo Bruzzone,et al.  Spiking Neural Networks for Crop Yield Estimation Based on Spatiotemporal Analysis of Image Time Series , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[21]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[22]  Nari Kim,et al.  Machine Learning Approaches to Corn Yield Estimation Using Satellite Images and Climate Data :A Case of Iowa State , 2016 .

[23]  David M. Johnson,et al.  A multi-resolution approach to national-scale cultivated area estimation of soybean , 2017 .

[24]  Juan Frausto-Solís,et al.  Predictive ability of machine learning methods for massive crop yield prediction , 2014 .

[25]  Jasmeet Judge,et al.  Assimilation of SMOS Soil Moisture for Quantifying Drought Impacts on Crop Yield in Agricultural Regions , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[26]  Matthew C. Hansen,et al.  National-scale soybean mapping and area estimation in the United States using medium resolution satellite imagery and field survey , 2017 .