Yield forecasting with machine learning and small data: What gains for grains?

Abstract Forecasting crop yields is important for food security, in particular to predict where crop production is likely to drop. Climate records and remotely-sensed data have become instrumental sources of data for crop yield forecasting systems. Similarly, machine learning methods are increasingly used to process big Earth observation data. However, access to data necessary to train such algorithms is often limited in food-insecure countries. Here, we evaluate the performance of machine learning algorithms and small data to forecast yield on a monthly basis between the start and the end of the growing season. To do so, we developed a robust and automated machine-learning pipeline which selects the best features and model for prediction. Taking Algeria as case study, we predicted national yields for barley, soft wheat and durum wheat with an accuracy of 0.16–0.2 t/ha (13-14% of mean yield) within the season. The best machine-learning models always outperformed simple benchmark models. This was confirmed in low-yielding years, which is particularly relevant for early warning. Nonetheless, the differences in accuracy between machine learning and benchmark models were not always of practical significance. Besides, the benchmark models outperformed up to 60% of the machine learning models that were tested, which stresses the importance of proper model calibration and selection. For crop yield forecasting, like for many application domains, machine learning has delivered significant improvement in predictive power. Nonetheless, superiority over simple benchmarks is fully achieved after extensive calibration, especially when dealing with small data.

[1]  Marco Zaffalon,et al.  Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis , 2016, J. Mach. Learn. Res..

[2]  Martha C. Anderson,et al.  The shared and unique values of optical, fluorescence, thermal and microwave satellite data for estimating large-scale crop yields. , 2016 .

[3]  Juan Du,et al.  Mid-Season High-Resolution Satellite Imagery for Forecasting Site-Specific Corn Yield , 2016, Remote. Sens..

[4]  Michele Meroni,et al.  Towards regional grain yield forecasting with 1km-resolution EO biophysical products: Strengths and limitations at pan-European level , 2015 .

[5]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[6]  Tim R. McVicar,et al.  Nationwide crop yield estimation based on photosynthesis and meteorological stress indices , 2020 .

[7]  Bernhard Schauberger,et al.  A systematic review of local to regional yield forecasting approaches and frequently used data resources , 2020 .

[8]  Herman Eerens,et al.  Empirical regression models using NDVI, rainfall and temperature data for the early prediction of wheat grain yields in Morocco , 2008, Int. J. Appl. Earth Obs. Geoinformation.

[9]  John M. Antle,et al.  Toward a new generation of agricultural system data, models, and knowledge products: State of agricultural systems science , 2017, Agricultural systems.

[10]  B. Basso,et al.  Seasonal crop yield forecast: Methods, applications, and accuracies , 2019, Advances in Agronomy.

[11]  Gustau Camps-Valls,et al.  Synergistic integration of optical and microwave satellite data for crop yield estimation , 2019, Remote sensing of environment.

[12]  Jianxi Huang,et al.  Improving the timeliness of winter wheat production forecast in the United States of America, Ukraine and China using MODIS data and NCAR Growing Degree Day information , 2015 .

[13]  Nithya Rajan,et al.  Monitoring regional wheat yield in Southern Spain using the GRAMI model and satellite imagery , 2012 .

[14]  Felix Rembold,et al.  Integrating multiple land cover maps through a multi-criteria analysis to improve agricultural monitoring in Africa , 2020, Int. J. Appl. Earth Obs. Geoinformation.

[15]  Senthold Asseng,et al.  Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches , 2018, Agricultural and Forest Meteorology.

[16]  T. Clark,et al.  Whole genome sequencing of drug resistant Mycobacterium tuberculosis isolates from a high burden tuberculosis region of North West Pakistan , 2019, Scientific Reports.

[17]  François Waldner,et al.  High temporal resolution of leaf area data improves empirical estimation of grain yield , 2019, Scientific Reports.

[18]  Andrew Davidson,et al.  Assessing the Performance of MODIS NDVI and EVI for Seasonal Crop Yield Forecasting at the Ecodistrict Scale , 2014, Remote. Sens..

[19]  M. Verstraete,et al.  A phenology-based method to derive biomass production anomalies for food security monitoring in the Horn of Africa , 2014 .

[20]  Amine Oulmane,et al.  Spatiotemporal analysis of rainfed cereal yields across the eastern high plateaus of Algeria: an exploratory investigation of the effects of weather factors , 2020, Euro-Mediterranean Journal for Environmental Integration.

[21]  Clement Atzberger,et al.  Using Low Resolution Satellite Imagery for Yield Prediction and Yield Anomaly Detection , 2013, Remote. Sens..

[22]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[23]  Michele Meroni,et al.  ASAP: A new global early warning system to detect anomaly hot spots of agricultural production for food security analysis , 2019, Agricultural systems.

[24]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[25]  Kerdiles Herve,et al.  The warning classification scheme of ASAP – Anomaly hot Spots of Agricultural Production, v1.1 , 2016 .

[26]  Torrin M. Liddell,et al.  The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective , 2016, Psychonomic bulletin & review.

[27]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[28]  Andrea Cavallaro,et al.  Sensor Capability and Atmospheric Correction in Ocean Colour Remote Sensing , 2015, Remote. Sens..

[29]  A. Strahler,et al.  Monitoring vegetation phenology using MODIS , 2003 .

[30]  J. Michaelsen,et al.  The climate hazards infrared precipitation with stations—a new environmental record for monitoring extremes , 2015, Scientific Data.

[31]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[32]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Michele Meroni,et al.  Near real-time vegetation anomaly detection with MODIS NDVI: Timeliness vs. accuracy and effect of anomaly computation options , 2019, Remote sensing of environment.

[34]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[35]  Michele Meroni,et al.  Estimating and understanding crop yields with explainable deep learning in the Indian Wheat Belt , 2020, Environmental Research Letters.

[36]  Adrian G Dyer,et al.  Ginger and the beetle: Evidence of primitive pollination system in a Himalayan endemic alpine ginger (Roscoea alpina, Zingiberaceae) , 2017, PloS one.

[37]  Mehrez Zribi,et al.  Forecasting of Cereal Yields in a Semi-arid Area Using the Simple Algorithm for Yield Estimation (SAFY) Agro-Meteorological Model Combined with Optical SPOT/HRV Images , 2018, Sensors.

[38]  Gustau Camps-Valls,et al.  Crop Yield Estimation and Interpretability With Gaussian Processes , 2021, IEEE Geoscience and Remote Sensing Letters.

[39]  C. Justice,et al.  Strengthening agricultural decisions in countries at risk of food insecurity: The GEOGLAM Crop Monitor for Early Warning , 2016, Remote Sensing of Environment.

[40]  Saharon Rosset,et al.  Leakage in data mining: formulation, detection, and avoidance , 2011, TKDD.

[41]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[42]  François Waldner,et al.  Estimating wheat yields in Australia using climate records, satellite image time series and machine learning methods , 2020 .

[43]  Michele Meroni,et al.  Remote Sensing Based Yield Estimation in a Stochastic Framework - Case Study of Durum Wheat in Tunisia , 2013, Remote. Sens..

[44]  Ian McCallum,et al.  A comparison of global agricultural monitoring systems and current gaps , 2019, Agricultural Systems.

[45]  Pierre Defourny,et al.  National-scale cropland mapping based on spectral-temporal features and outdated land cover information , 2017, PloS one.

[46]  C. Justice,et al.  A generalized regression-based model for forecasting winter wheat yields in Kansas and Ukraine using MODIS data , 2010 .

[47]  Liangliang Zhang,et al.  Combining Optical, Fluorescence, Thermal Satellite, and Environmental Data to Predict County-Level Maize Yield in China Using Machine Learning Approaches , 2019, Remote. Sens..

[48]  M. D. Johnson,et al.  Crop yield forecasting on the Canadian Prairies by satellite data and machine learning methods , 2013 .

[49]  Alex J. Cannon,et al.  Crop yield forecasting on the Canadian Prairies by remotely sensed vegetation indices and machine learning methods , 2016 .

[50]  Benoît Duchemin,et al.  A simple algorithm for yield estimates: Evaluation for semi-arid irrigated winter wheat monitored with green leaf area index , 2008, Environ. Model. Softw..