Predictive ability of machine learning methods for massive crop yield prediction

An important issue for agricultural planning purposes is the accurate yield estimation for the numerous crops involved in the planning. Machine learning (ML) is an essential approach for achieving practical and effective solutions for this problem. Many comparisons of ML methods for yield prediction have been made, seeking for the most accurate technique. Generally, the number of evaluated crops and techniques is too low and does not provide enough information for agricultural planning purposes. This paper compares the predictive accuracy of ML and linear regression techniques for crop yield prediction in ten crop datasets. Multiple linear regression, M5-Prime regression trees, perceptron multilayer neural networks, support vector regression and k-nearest neighbor methods were ranked. Four accuracy metrics were used to validate the models: the root mean square error (RMS), root relative square error (RRSE), normalized mean absolute error (MAE), and correlation factor (R). Real data of an irrigation zone of Mexico were used for building the models. Models were tested with samples of two consecutive years. The results show that M5- Prime and k-nearest neighbor techniques obtain the lowest average RMSE errors (5.14 and 4.91), the lowest RRSE errors (79.46% and 79.78%), the lowest average MAE errors (18.12% and 19.42%), and the highest average correlation factors (0.41 and 0.42). Since M5-Prime achieves the largest number of crop yield models with the lowest errors, it is a very suitable tool for massive crop yield prediction in agricultural planning

[1]  Kenneth A. Sudduth,et al.  Analysis of Spatial Factors Influencing Crop Yield , 2015 .

[2]  S. Hollinger,et al.  ESTIMATING CORN YIELD RESPONSE MODELS TO PREDICT IMPACTS OF CLIMATE CHANGE , 1994 .

[3]  Raúl Rojas,et al.  Neural Networks - A Systematic Introduction , 1996 .

[4]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[5]  Jing Liu,et al.  Neural networks for setting target corn yields , 2000 .

[6]  V. J. Varcoe A note on the computer simulation of crop growth in agricultural land evaluation. , 1990 .

[7]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[8]  J. R. Kiniry,et al.  CERES-Maize: a simulation model of maize growth and development , 1986 .

[9]  Georg Ruß,et al.  Data Mining of Agricultural Yield Data: A Comparison of Regression Models , 2009, ICDM.

[10]  P. Jamieson,et al.  Sirius: a mechanistic model of wheat response to environmental variation , 1998 .

[11]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[12]  Yuhong Yang CONSISTENCY OF CROSS VALIDATION FOR COMPARING REGRESSION PROCEDURES , 2007, 0803.2963.

[13]  S. Sathiya Keerthi,et al.  Improvements to the SMO algorithm for SVM regression , 2000, IEEE Trans. Neural Networks Learn. Syst..

[14]  J. Porter AFRCWHEAT2: A model of the growth and development of wheat incorporating responses to water and nitrogen , 1993 .

[15]  Stuart Barber,et al.  All of Statistics: a Concise Course in Statistical Inference , 2005 .

[16]  J. Porter,et al.  A comparison of the models AFRCWHEAT2, CERES-Wheat, Sirius, SUCROS2 and SWHEAT with measurements from wheat grown under drought , 1998 .

[17]  Richard E. Plant,et al.  Factors underlying yield variability in two California rice fields , 2004 .

[18]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[19]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[20]  Rudolf Kruse,et al.  Feature Selection for Wheat Yield Prediction , 2009, SGAI Conf..

[21]  R. Rabbinge,et al.  Simulation and systems management in crop protection , 1989, Plant Growth Regulation.

[22]  Kenneth A. Sudduth,et al.  STATISTICAL AND NEURAL METHODS FOR SITE–SPECIFIC YIELD PREDICTION , 2003 .

[23]  James W. Jones,et al.  Modeling Soybean Growth for Crop Management , 1983 .

[24]  J. Goudriaan,et al.  Modelling Potential Crop Growth Processes , 1994, Current Issues in Production Ecology.

[25]  Safa,et al.  Artificial Neural Networks Application to Predict Wheat Yield Using Climatic Data , 2011 .

[26]  Ayse Irmak,et al.  Artificial Neural Network Model as a Data Analysis Tool in Precision Farming , 2006 .

[27]  Ron Kohavi,et al.  Wrappers for performance enhancement and oblivious decision graphs , 1995 .

[28]  Sanja Brdar,et al.  Data Mining Approach for Predictive Modeling of Agricultural Yield Data , 2009 .

[29]  V. Vapnik Pattern recognition using generalized portrait method , 1963 .

[30]  Waldo Ojeda-Bustamante,et al.  Using spatial information systems to improve water management in Mexico , 2007 .

[31]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[32]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[33]  F. Anctil,et al.  Site-specific early season potato yield forecast by neural network in Eastern Canada , 2011, Precision Agriculture.

[34]  Juan Frausto Solís,et al.  A New Method for Optimal Cropping Pattern , 2009, MICAI.

[35]  Robert J. McQueen,et al.  Applying machine learning to agricultural data , 1995 .

[36]  Minghua Zhang,et al.  Simulation and Prediction of Soybean Growth and Development under Field Conditions , 2010 .

[37]  S. Auephanwiriyakul,et al.  Rice yield prediction using a Support Vector Regression method , 2008, 2008 5th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology.

[38]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[39]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[40]  S. Recous,et al.  STICS : a generic model for the simulation of crops and their water and nitrogen balances. I. Theory, and parameterization applied to wheat and corn , 1998 .

[41]  LHAN UY SAL,et al.  An overview of regression techniques for knowledge discovery , 1999 .