Machine learning approaches can reduce environmental data requirements for regional yield potential simulation

Abstract The crop growth model (CGM) widely contributes to studying the impacts of regional climate change on crop growth status and yield. However, it requires high-quality daily weather data for the model establishment and verification, and the low data availability limits the application of CGM at the regional scale. With the rapid development of machine learning techniques, adapting data-driven machine learning algorithms to build a meta-model with environmental feature variables at the desired spatio-temporal scale provides a new method for regional yield simulation. In this study, we developed four machine-learning-based meta-models to simulate regional yield potential (RYP) of wheat in China with the selected environmental feature variables and assessed four different machine learning algorithms, including multiple linear regression (MLR), artificial neural networks (ANN), random forest (RF) and support vector regression (SVR). The research aimed to verify whether these meta-models can outperform CGM in simulating RYP. The results showed that the meta-models could reduce the requirements of the number of input variables and the amount of data for RYP simulation and maintain the simulation accuracy because monthly weather variables could replace daily weather variables in the meta-models. Although all four meta-models can well reveal the mean RYP, the meta-model based on RF has the best performance. In the RF-based meta-model, longitude, latitude, altitude, and the averaged maximum temperature in March are the top four ranked essential variables. However, the generalizability of meta-models is affected by the training dataset, and the meta-models cannot adapt appropriately to new unseen data. Moreover, there is no unique optimal machine learning algorithm used for building meta-models, and this will increase the workload of similar research in the future.

[1]  W. Cao,et al.  Modeling plant nitrogen uptake and grain nitrogen accumulation in wheat , 2006 .

[2]  A. Ruane,et al.  Impacts of 1.5 °C and 2.0 °C global warming above pre-industrial on potential winter wheat production of China , 2020, European Journal of Agronomy.

[3]  Eric Strobl,et al.  The Distributional Impact of Large Dams: Evidence from Cropland Productivity in Africa , 2011 .

[4]  Guohe Huang,et al.  A study on DEM-derived primary topographic attributes for hydrologic applications: Sensitivity to elevation data resolution , 2008 .

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  C. Müller,et al.  The effect of temporal aggregation of weather input data on crop growth models' results , 2011 .

[7]  R. Alvarez Predicting average regional yield and production of wheat in the Argentine Pampas by an artificial neural network approach , 2009 .

[8]  Ravinesh C. Deo,et al.  Artificial intelligence approach for the prediction of Robusta coffee yield using soil fertility properties , 2018, Comput. Electron. Agric..

[9]  G. Alagarswamy,et al.  Spatial variation of crop yield response to climate change in East Africa , 2009 .

[10]  Zoubin Ghahramani,et al.  Probabilistic machine learning and artificial intelligence , 2015, Nature.

[11]  S. Asseng,et al.  Modelling the effects of heat stress on post-heading durations in wheat: A comparison of temperature response routines , 2016 .

[12]  Carol X. Song,et al.  Global Gridded Crop Model evaluation: benchmarking, skills, deficiencies and implications , 2016 .

[13]  Y. Everingham,et al.  Accurate prediction of sugarcane yield using a random forest algorithm , 2016, Agronomy for Sustainable Development.

[14]  Juan Frausto-Solís,et al.  Predictive ability of machine learning methods for massive crop yield prediction , 2014 .

[15]  C. Forest,et al.  Analysis of climate signals in the crop yield record of sub‐Saharan Africa , 2018, Global change biology.

[16]  Antaryami Mishra,et al.  Application of Artificial Neural Network modeling for optimization and prediction of essential oil yield in turmeric (Curcuma longa L.) , 2018, Comput. Electron. Agric..

[17]  Liangzhi You,et al.  Impact of growing season temperature on wheat productivity in China , 2009 .

[18]  Elodie Blanc,et al.  Emulating maize yields from global gridded crop models using statistical estimates , 2015 .

[19]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[20]  G. Heuvelink,et al.  Spatio-temporal prediction of daily temperatures using time-series of MODIS LST images , 2013, Theoretical and Applied Climatology.

[21]  A. Berg,et al.  The impact of future climate change on West African crop yields: What does the recent literature say? , 2011 .

[22]  W. Schlenker,et al.  Nonlinear temperature effects indicate severe damages to U.S. crop yields under climate change , 2009, Proceedings of the National Academy of Sciences.

[23]  Stephen J. Smith,et al.  Exploring the role of environmental variables in shaping patterns of seabed biodiversity composition in regional‐scale ecosystems , 2012, The Journal of applied ecology.

[24]  P. Feng,et al.  Incorporating machine learning with biophysical model can improve the evaluation of climate extremes impacts on wheat yield in south-eastern Australia , 2019, Agricultural and Forest Meteorology.

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  Roger W. Elmore,et al.  Can crop simulation models be used to predict local to regional maize yields and total production in the U.S. Corn Belt , 2016 .

[27]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[28]  M. W Gardner,et al.  Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences , 1998 .

[29]  S. Vincenzi,et al.  Application of a Random Forest algorithm to predict spatial distribution of the potential yield of Ruditapes philippinarum in the Venice lagoon, Italy , 2011 .

[30]  T. Farr,et al.  Shuttle radar topography mission produces a wealth of data , 2000 .

[31]  Rong-hui Huang,et al.  Different characteristics of the quasi-biweekly oscillation over the South China Sea in two boreal summer stages , 2016, Theoretical and Applied Climatology.

[32]  D. Lobell,et al.  On the use of statistical models to predict crop yield responses to climate change , 2010 .

[33]  D. Bui,et al.  A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape. , 2015 .

[34]  Zailin Huo,et al.  Simulation for response of crop yield to soil moisture and salinity with artificial neural network , 2011 .

[35]  P. L. Mitchell,et al.  Decline in rice grain yields with temperature : Models and correlations can give different estimates , 2006 .

[36]  Vic Barnett,et al.  A PARSIMONIOUS, MULTIPLE-REGRESSION MODEL OF WHEAT YIELD RESPONSE TO ENVIRONMENT , 2000 .

[37]  Yongchao Tian,et al.  Impacts of Spatial Zonation Schemes on Yield Potential Estimates at the Regional Scale , 2020, Agronomy.

[38]  D. Conway,et al.  A crop model cross calibration for use in regional climate impacts studies , 2008 .

[39]  Yongchao Tian,et al.  Selection of Appropriate Spatial Resolution for the Meteorological Data for Regional Winter Wheat Potential Productivity Simulation in China Based on WheatGrow Model , 2018, Agronomy.

[40]  S. Nonhebel The Effects of Use of Average Instead of Daily Weather Data in Crop Growth Simulation Models , 1994 .

[41]  Peter J. Thorburn,et al.  Modelling crops and cropping systems—Evolving purpose, practice and prospects , 2018, European Journal of Agronomy.

[42]  Gustavo A. Slafer,et al.  Wheat productivity in the Mediterranean Ebro Valley: Analyzing the gap between attainable and potential yield with a simulation model , 2008 .

[43]  Jonathan P. Resop,et al.  Random Forests for Global and Regional Crop Yield Predictions , 2016, PloS one.

[44]  C. Folberth,et al.  Spatio-temporal downscaling of gridded crop model yield estimates based on machine learning , 2019, Agricultural and Forest Meteorology.

[45]  R. Confalonieri,et al.  Forecasting sugarcane yields using agro-climatic indicators and Canegro model: A case study in the main production region in Brazil , 2017 .