Wild blueberry yield prediction using a combination of computer simulation and machine learning algorithms

Abstract The most challenging task in the agricultural sector is to accurately predict crop yield. A typical machine learning algorithm often uses real data to predict crop yield. In this study, we used data generated by the Wild Blueberry Pollination Model, a spatially explicit simulation model validated by field observation and experimental data collected in Maine USA during the last 30 years. The main aim of this study is to evaluate the relative importance of bee species composition and weather factors in regulating wild blueberry agroecosystems. Specifically, we sought to reveal how bee species composition and weather affect yield and to predict optimal bee species composition and weather conditions that achieve the best yield using computer simulation and machine learning algorithms. Multiple linear regression (MLR), boosted decision trees (BDT), random forest (RF), and extreme gradient boosting (XGBoost) were evaluated as predictive tools. We also performed a predictor selection before submitting our data to the learning algorithms. In this way, we are able to reduce the dimension of the input without a significant drop in prediction accuracy. As a result, clone size, honeybee, bumblebee, Andrena bee species, Osmia bee species, maximum of upper-temperature ranges, and the number of days with precipitation were chosen as the best predictor variable subset. The results showed that the XGBoost outperformed other algorithms in all measures of model performance for predicting the yield of wild blueberry by achieving a coefficient of determination (R2) of 0.938, root mean square error (RMSE) of 343.026, mean absolute error (MAE) of 206 and relative root mean square error (RRMSE) of 5.444%. The results are consistent with previous work on predicting wild blueberry fruit yield using digital color photography by (Zaman et al., 2008). This study showed that crop yield predictions can be based on computer simulation modeling datasets. Therefore, if a reasonable prediction can be reached, this study should have a significant impact, especially when data collection in the field is challenging.

[1]  Hong Wan,et al.  Work smarter, not harder: A tutorial on designing and conducting simulation experiments , 2012, 2015 Winter Simulation Conference (WSC).

[2]  R. Bhargavi,et al.  Optimum Feature Subset for Optimizing Crop Yield Prediction Using Filter and Wrapper Approaches , 2019 .

[3]  O. Mutanga,et al.  Evaluating the utility of the medium-spatial resolution Landsat 8 multispectral sensor in quantifying aboveground biomass in uMgeni catchment, South Africa , 2015 .

[4]  F. Drummond,et al.  Pollen-mediated gene flow in managed fields of lowbush blueberry , 2019, Canadian Journal of Plant Science.

[5]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[6]  Qamar Uz Zaman,et al.  Estimation of Wild Blueberry Fruit Yield Using Digital Color Photography , 2008 .

[7]  Alex J. Cannon,et al.  Maize yield forecasting by linear regression and artificial neural networks in Jilin, China , 2014, The Journal of Agricultural Science.

[8]  Philippe Aras,et al.  Effect of a honey bee (Hymenoptera : Apidae) gradient on the pollination and yield of lowbush blueberry , 1996 .

[9]  P. Ojiambo,et al.  Predicting Pre-planting Risk of Stagonospora nodorum blotch in Winter Wheat Using Machine Learning Models , 2016, Front. Plant Sci..

[10]  J. Ascher,et al.  A Natural History of Change in Native Bees Associated with Lowbush Blueberry in Maine , 2017, Northeastern Naturalist.

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  Andreas Tolk,et al.  The next generation of modeling & simulation: integrating big data and deep learning , 2015, SummerSim.

[13]  Thomas W. Lucas,et al.  Defense and homeland security applications of multi-agent simulations , 2007, 2007 Winter Simulation Conference.

[14]  Clayton M. Hodges Optimal foraging in bumblebees: Hunting by expectation , 1981, Animal Behaviour.

[15]  Hongchun Qu,et al.  A spatially explicit agent-based simulation platform for investigating effects of shared pollination service on ecological communities , 2013, Simul. Model. Pract. Theory.

[16]  Hongbin Liu,et al.  General models for estimating daily global solar radiation for different solar radiation zones in mainland China , 2013 .

[17]  D. Hiebeler,et al.  Grid-Set-Match, an agent-based simulation model, predicts fruit set for the lowbush blueberry (Vaccinium angustifolium) agroecosystem , 2017 .

[18]  L J Francl,et al.  Neural network classification of tan spot and stagonospora blotch infection periods in a wheat field environment. , 2000, Phytopathology.

[19]  F. Drummond,et al.  The Ecology of Autogamy in Wild Blueberry (Vaccinium angustifolium Aiton): Does the Early Clone Get the Bee? , 2020, Agronomy.

[20]  Kenneth A. Sudduth,et al.  STATISTICAL AND NEURAL METHODS FOR SITE–SPECIFIC YIELD PREDICTION , 2003 .

[21]  Nigel Gilbert,et al.  Holism, Individualism and Emergent Properties , 1996 .

[22]  Elizabeth A. Peck,et al.  Introduction to Linear Regression Analysis , 2001 .

[23]  Sotirios Archontoulis,et al.  Development of a nitrogen recommendation tool for corn considering static and dynamic variables , 2019, European Journal of Agronomy.

[24]  Bernadine C. Strik,et al.  Blueberry Production Trends in North America, 1992 to 2003, and Predictions for Growth , 2005 .

[25]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[26]  Harris Drucker,et al.  Improving Regressors using Boosting Techniques , 1997, ICML.

[27]  Salah Sukkarieh,et al.  Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: A review , 2018, Comput. Electron. Agric..

[28]  Benjamin Peherstorfer,et al.  Analysis of Car Crash Simulation Data with Nonlinear Machine Learning Methods , 2013, ICCS.

[29]  Mohsen Shahhosseini,et al.  Maize yield and nitrate loss prediction with machine learning algorithms , 2019, Environmental Research Letters.

[30]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[31]  Jonathan P. Resop,et al.  Random Forests for Global and Regional Crop Yield Predictions , 2016, PloS one.

[32]  Lei Zhang,et al.  Using boosted tree regression and artificial neural networks to forecast upland rice yield under climate change in Sahel , 2019, Comput. Electron. Agric..

[33]  Chris Murphy,et al.  APSIM - Evolution towards a new generation of agricultural systems simulation , 2014, Environ. Model. Softw..

[34]  B. Ji,et al.  Artificial neural networks for rice yield prediction in mountainous regions , 2007, The Journal of Agricultural Science.

[35]  F. Drummond,et al.  A global review of arthropod-mediated ecosystem-services in Vaccinium berry agroecosystems , 2014 .

[36]  J. Stommel,et al.  Yield Variation among Clones of Lowbush Blueberry as a Function of Genetic Similarity and Self-compatibility , 2010 .

[37]  J. Stoorvogel,et al.  Comparison of Three Modelling Approaches to Simulate Regional Crop Yield: A Case Study of Winter Wheat Yield in Western Germany , 2016 .

[38]  S. Chakraborty,et al.  Weather-based prediction of anthracnose severity using artificial neural network models , 2004 .

[39]  Frank Drummond,et al.  Simulation-based modeling of wild blueberry pollination , 2018, Comput. Electron. Agric..

[40]  M. Seifan,et al.  Effects of plant and pollinator traits on the maintenance of a food deceptive species within a plant community , 2017 .

[41]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[42]  A. Crane-Droesch Machine learning methods for crop yield prediction and climate change impact assessment in agriculture , 2018, Environmental Research Letters.

[43]  F. Drummond Reproductive Biology of Wild Blueberry (Vaccinium angustifolium Aiton) , 2019, Agriculture.

[44]  F. Drummond Behavior of Bees Associated with the Wild Blueberry Agro-ecosystem in the USA , 2016 .

[45]  Timothy W. Simpson,et al.  Metamodels for Computer-based Engineering Design: Survey and recommendations , 2001, Engineering with Computers.

[46]  Alex J. Cannon,et al.  Crop yield forecasting on the Canadian Prairies by remotely sensed vegetation indices and machine learning methods , 2016 .

[47]  Kalyan Veeramachaneni,et al.  The Synthetic Data Vault , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[48]  E. Asare,et al.  Economic Risk of Bee Pollination in Maine Wild Blueberry, Vaccinium angustifolium , 2017, Journal of Economic Entomology.

[49]  Hamdy K. Elminir,et al.  ESTIMATION OF AIR POLLUTANT CONCENTRATIONS FROM METEOROLOGICAL PARAMETERS USING ARTIFICIAL NEURAL NETWORK , 2006 .

[50]  Matthias Klusch,et al.  Digital reality: a model-based approach to supervised learning from synthetic data , 2019, AI Perspectives.

[51]  T. Peever,et al.  Predicting Ascospore Release of Monilinia vaccinii-corymbosi of Blueberry with Machine Learning. , 2017, Phytopathology.