Maize yield and nitrate loss prediction with machine learning algorithms

Pre-season prediction of crop production outcomes such as grain yields and N losses can provide insights to stakeholders when making decisions. Simulation models can assist in scenario planning, but their use is limited because of data requirements and long run times. Thus, there is a need for more computationally expedient approaches to scale up predictions. We evaluated the potential of five machine learning (ML) algorithms as meta-models for a cropping systems simulator (APSIM) to inform future decision-support tool development. We asked: 1) How well do ML meta-models predict maize yield and N losses using pre-season information? 2) How many data are needed to train ML algorithms to achieve acceptable predictions?; 3) Which input data variables are most important for accurate prediction?; and 4) Do ensembles of ML meta-models improve prediction? The simulated dataset included more than 3 million genotype, environment and management scenarios. Random forests most accurately predicted maize yield and N loss at planting time, with a RRMSE of 14% and 55%, respectively. ML meta-models reasonably reproduced simulated maize yields but not N loss. They also differed in their sensitivities to the size of the training dataset. Across all ML models, yield prediction error decreased by 10-40% as the training dataset increased from 0.5 to 1.8 million data points, whereas N loss prediction error showed no consistent pattern. ML models also differed in their sensitivities to input variables. Averaged across all ML models, weather conditions, soil properties, management information and initial conditions were roughly equally important when predicting yields. Modest prediction improvements resulted from ML ensembles. These results can help accelerate progress in coupling simulation models and ML toward developing dynamic decision support tools for pre-season management.

[1]  Neil I. Huth,et al.  Enhancing APSIM to simulate excessive moisture effects on root growth , 2019, Field Crops Research.

[2]  C. Müller,et al.  Multimodel ensembles improve predictions of crop–environment–management interactions , 2018, Global change biology.

[3]  Mitigation,et al.  Sustainable Corn CAP (USDA-NIFA Award No. 2011-68002-30190) Year 5_6 REEport , 2014 .

[4]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[5]  Hieu Pham,et al.  Optimizing Ensemble Weights for Machine Learning Models: A Case Study for Housing Price Prediction , 2019 .

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[8]  Vic Barnett,et al.  A PARSIMONIOUS, MULTIPLE-REGRESSION MODEL OF WHEAT YIELD RESPONSE TO ENVIRONMENT , 2000 .

[9]  David Clifford,et al.  Simple approach to emulating complex computer models for global sensitivity analysis , 2015, Environ. Model. Softw..

[10]  Fernando E. Miguez,et al.  A methodology and an optimization tool to calibrate phenology of short-day species included in the APSIM PLANT model: Application to soybean , 2014, Environ. Model. Softw..

[11]  Jing Liu,et al.  Neural networks for setting target corn yields , 2000 .

[12]  Fernando E. Miguez,et al.  Nonlinear Regression Models and Applications in Agricultural Research , 2015 .

[13]  Jean-Francois Lamarque,et al.  NITROGEN DEPOSITION ONTO THE UNITED STATES AND WESTERN EUROPE: SYNTHESIS OF OBSERVATIONS AND MODELS , 2005 .

[14]  John E. Sawyer,et al.  Concepts and Rationale for Regional Nitrogen Rate Guidelines for Corn , 2006 .

[15]  Lizhi Wang,et al.  Crop Yield Prediction Using Deep Neural Networks , 2019, Front. Plant Sci..

[16]  Peter J. Thorburn,et al.  Emulated Multivariate Global Sensitivity Analysis for Complex Computer Models Applied to Agricultural Simulators , 2018, Journal of Agricultural, Biological and Environmental Statistics.

[17]  A. VanLoocke,et al.  How does inclusion of weather forecasting impact in-season crop model predictions? , 2017 .

[18]  Nathalie Villa-Vialaneix,et al.  A comparison of eight metamodeling techniques for the simulation of N2O fluxes and N leaching from corn crops , 2012, Environ. Model. Softw..

[19]  Fulu Tao,et al.  Simulation of maize evapotranspiration: An inter-comparison among 29 maize models , 2019, Agricultural and Forest Meteorology.

[20]  K. Moore,et al.  Evaluating APSIM Maize, Soil Water, Soil Nitrogen, Manure, and Soil Temperature Modules in the Midwestern United States , 2014 .

[21]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[22]  Timothy W. Simpson,et al.  Metamodels for Computer-based Engineering Design: Survey and recommendations , 2001, Engineering with Computers.

[23]  Sotirios Archontoulis,et al.  Development of a nitrogen recommendation tool for corn considering static and dynamic variables , 2019, European Journal of Agronomy.

[24]  Juan Frausto-Solís,et al.  Predictive ability of machine learning methods for massive crop yield prediction , 2014 .

[25]  Juha Reunanen,et al.  Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..

[26]  Shinji Fukuda,et al.  Random Forests modelling for the estimation of mango (Mangifera indica L. cv. Chok Anan) fruit yields under different irrigation regimes , 2013 .

[27]  A. Crane-Droesch Machine learning methods for crop yield prediction and climate change impact assessment in agriculture , 2018, Environmental Research Letters.

[28]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[29]  S. O. Prasher,et al.  Application of support vector machine technology for the estimation of crop biophysical parameters using aerial hyperspectral observations , 2008 .

[30]  Naresh Kumar,et al.  Nitrogen Deposition to the United States: Distribution, Sources, and Processes , 2012 .

[31]  Jim W. Hall,et al.  Sensitivity analysis of environmental models: A systematic review with practical workflow , 2014, Environ. Model. Softw..

[32]  Sigurdur Ólafsson,et al.  Data clustering using proximity matrices with missing values , 2019, Expert Syst. Appl..

[33]  Hieu Pham,et al.  On Cesáro Averages for Weighted Trees in the Random Forest , 2019, Journal of Classification.

[34]  M. Helmers,et al.  Linking crop- and soil-based approaches to evaluate system nitrogen-use efficiency and tradeoffs , 2018 .

[35]  John E. Sawyer,et al.  Strengths and Limitations of Nitrogen Rate Recommendations for Corn and Opportunities for Improvement , 2018 .

[36]  G. De’ath,et al.  CLASSIFICATION AND REGRESSION TREES: A POWERFUL YET SIMPLE TECHNIQUE FOR ECOLOGICAL DATA ANALYSIS , 2000 .

[37]  Chris Murphy,et al.  APSIM - Evolution towards a new generation of agricultural systems simulation , 2014, Environ. Model. Softw..

[38]  Hieu Pham,et al.  Optimizing Ensemble Weights and Hyperparameters of Machine Learning Models for Regression Problems , 2019, Machine Learning with Applications.

[39]  I. Ciampitti,et al.  Physiological perspectives of changes over time in maize yield dependency on nitrogen uptake and associated nitrogen efficiencies: A review , 2012 .

[40]  Adrian Leip,et al.  Development of marginal emission factors for N losses from agricultural soils with the DNDC–CAPRI meta-model , 2009 .

[41]  Jeffrey W. White,et al.  From genome to crop: integration through simulation modeling , 2004 .

[42]  Michael N Fienen,et al.  Metamodels to Bridge the Gap Between Modeling and Decision Support. , 2015, Ground water.

[43]  P. Thorburn,et al.  Modelling nitrogen dynamics in sugarcane systems: Recent advances and applications , 2005 .

[44]  David W. Franzen,et al.  Application of Machine Learning Methodologies for Predicting Corn Economic Optimal Nitrogen Rate , 2018, Agronomy Journal.

[45]  Rebecca L. Whetton,et al.  Machine learning based prediction of soil total nitrogen, organic carbon and moisture content by using VIS-NIR spectroscopy , 2016 .

[46]  Onisimo Mutanga,et al.  High density biomass estimation for wetland vegetation using WorldView-2 imagery and random forest regression algorithm , 2012, Int. J. Appl. Earth Obs. Geoinformation.

[47]  Javed Iqbal,et al.  Extreme weather‐year sequences have nonadditive effects on environmental nitrogen losses , 2018, Global change biology.

[48]  David Makowski,et al.  Meta-modeling methods for estimating ammonia volatilization from nitrogen fertilizer and manure applications. , 2019, Journal of environmental management.

[49]  N. I. Huth,et al.  SWIM3: Model Use, Calibration, and Validation , 2012 .

[50]  Mansour Ebrahimi,et al.  Determining the Most Important Physiological and Agronomic Traits Contributing to Maize Grain Yield through Machine Learning Algorithms: A New Avenue in Intelligent Agriculture , 2014, PloS one.

[51]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[52]  Kenneth A. Sudduth,et al.  STATISTICAL AND NEURAL METHODS FOR SITE–SPECIFIC YIELD PREDICTION , 2003 .

[53]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[54]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[55]  Jonathan P. Resop,et al.  Random Forests for Global and Regional Crop Yield Predictions , 2016, PloS one.

[56]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[57]  John E. Sawyer,et al.  Modeling Long-Term Corn Yield Response to Nitrogen Rate and Crop Rotation , 2016, Front. Plant Sci..

[58]  W. M. Stewart,et al.  Nutrient partitioning and stoichiometry in soybean: A synthesis-analysis , 2017 .

[59]  Lizhi Wang,et al.  Optimizing Selection and Mating in Genomic Selection with a Look-Ahead Approach: An Operations Research Framework , 2019, G3: Genes, Genomes, Genetics.

[60]  Roy B. Dodd,et al.  COMPARISON OF DIFFERENT TYPES OF LIGHT SOURCES FOR OPTICAL COTTON MASS MEASUREMENTA NEURAL NETWORK FOR SETTING TARGET CORN YIELDS , 2001 .

[61]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[62]  Saeed Khaki,et al.  Classification of Crop Tolerance to Heat and Drought: A Deep Convolutional Neural Networks Approach , 2019, Agronomy.

[63]  Hieu Pham,et al.  Bagged ensembles with tunable parameters , 2018, Comput. Intell..

[64]  L. Plümer,et al.  Original paper: Early detection and classification of plant diseases with Support Vector Machines based on hyperspectral reflectance , 2010 .

[65]  Neil I. Huth,et al.  Optimal Nitrogen Rate Can Be Predicted Using Average Yield and Estimates of Soil Water and Leaf Nitrogen with Infield Experimentation , 2019, Agronomy Journal.

[66]  S. Vincenzi,et al.  Application of a Random Forest algorithm to predict spatial distribution of the potential yield of Ruditapes philippinarum in the Venice lagoon, Italy , 2011 .

[67]  Javad Ansarifar,et al.  New algorithms for detecting multi-effect and multi-way epistatic interactions , 2019, Bioinform..

[68]  Mark A. Licht,et al.  Using the Soybean Planting Decision Tool to Help Make Planting Date and Maturity Selection , 2015 .

[69]  B. Basso,et al.  Seasonal crop yield forecast: Methods, applications, and accuracies , 2019, Advances in Agronomy.

[70]  P. L. Mitchell,et al.  Decline in rice grain yields with temperature : Models and correlations can give different estimates , 2006 .

[71]  James W. Jones,et al.  Uncertainty in Simulating Wheat Yields Under Climate Change , 2013 .

[72]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[73]  Matthew J. Helmers,et al.  Calibration and validation of DRAINMOD to design subsurface drainage systems for Iowa's tile landscapes , 2006 .

[74]  R. Dalal,et al.  APSIM's water and nitrogen modules and simulation of the dynamics of water and nitrogen in fallow systems , 1998 .

[75]  Peter J. Thorburn,et al.  Modelling decomposition of sugar cane surface residues with APSIM–Residue , 2001 .

[76]  Yaxing Wei,et al.  Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Version 2 , 2014 .

[77]  Liwang Ma,et al.  Evaluating and predicting agricultural management effects under tile drainage using modified APSIM , 2007 .

[78]  D. Basak,et al.  Support Vector Regression , 2008 .

[79]  Matthew J. Helmers,et al.  Rye cover crop effects on maize: A system-level analysis , 2016 .

[80]  Lixia Liao,et al.  Metamodeling and mapping of nitrate flux in the unsaturated zone and groundwater, Wisconsin, USA , 2018 .

[81]  J. Gordon Arbuckle,et al.  Iowa Farmers’ Nitrogen Management Practices and Perspectives , 2014 .

[82]  Meghann Jarchow,et al.  How efficiently do corn‐ and soybean‐based cropping systems use water? A systems modeling analysis , 2016, Global change biology.

[83]  Senthold Asseng,et al.  An overview of APSIM, a model designed for farming systems simulation , 2003 .