Predicting yield performance of parents in plant breeding: A neural collaborative filtering approach

Experimental corn hybrids are created in plant breeding programs by crossing two parents, so-called inbred and tester, together. Identification of best parent combinations for crossing is challenging since the total number of possible cross combinations of parents is large and it is impractical to test all possible cross combinations due to limited resources of time and budget. In the 2020 Syngenta Crop Challenge, Syngenta released several large datasets that recorded the historical yield performances of around 4% of total cross combinations of 593 inbreds with 496 testers which were planted in 280 locations between 2016 and 2018 and asked participants to predict the yield performance of cross combinations of inbreds and testers that have not been planted based on the historical yield data collected from crossing other inbreds and testers. In this paper, we present a collaborative filtering method which is an ensemble of matrix factorization method and a neural network to solve this problem. Our computational results suggested that the proposed model significantly outperformed other models such as deep factorization machines (DeepFM), generalized matrix factorization (GMF), LASSO, random forest (RF), and neural networks. Presented method and results were produced within the 2020 Syngenta Crop Challenge.

[1]  M. Balzarini,et al.  Applications of mixed models in plant breeding. , 2001 .

[2]  Lizhi Wang,et al.  Crop Yield Prediction Using Deep Neural Networks , 2019, Front. Plant Sci..

[3]  M. Balzarini,et al.  Biometrical Models for Predicting Future Performance in Plant Breeding. , 2000 .

[4]  R. Busch,et al.  Genetic Diversity among North American Spring Wheat Cultivars: III. Cluster Analysis Based on Quantitative Morphological Traits , 1997 .

[5]  F. Carvalho,et al.  Parental Selection Strategies in Plant Breeding Programs , 2008 .

[6]  Paul Covington,et al.  Deep Neural Networks for YouTube Recommendations , 2016, RecSys.

[7]  José Crossa,et al.  Genome-enabled prediction using probabilistic neural network classifiers , 2016, BMC Genomics.

[8]  Philomin Juliana,et al.  A Benchmarking Between Deep Learning, Support Vector Machine and Bayesian Threshold Best Linear Unbiased Prediction for Predicting Ordinal Traits in Plant Breeding , 2018, G3: Genes, Genomes, Genetics.

[9]  R. Bernardo Best linear unbiased prediction of maize single-cross performance , 1996 .

[10]  Yunming Ye,et al.  DeepFM: A Factorization-Machine based Neural Network for CTR Prediction , 2017, IJCAI.

[11]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[12]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[13]  José Crossa,et al.  Applications of Machine Learning Methods to Genomic Selection in Breeding Wheat for Rust Resistance , 2018, The plant genome.

[14]  A. Crane-Droesch Machine learning methods for crop yield prediction and climate change impact assessment in agriculture , 2018, Environmental Research Letters.

[15]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[16]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[17]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[18]  Hieu Pham,et al.  Optimizing Ensemble Weights and Hyperparameters of Machine Learning Models for Regression Problems , 2019, Machine Learning with Applications.

[19]  Lizhi Wang,et al.  A CNN-RNN Framework for Crop Yield Prediction , 2019, Frontiers in Plant Science.

[20]  Mohsen Shahhosseini,et al.  Forecasting Corn Yield With Machine Learning Ensembles , 2020, Frontiers in Plant Science.

[21]  M. Sorrells,et al.  Prediction of heterosis in wheat using coefficient of parentage and RFLP-based estimates of genetic relationship. , 1996, Genome.

[22]  Mark Weiser,et al.  Source Code , 1987, Computer.

[23]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[24]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[25]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[26]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[27]  Matthias Frisch,et al.  Genome-based prediction of test cross performance in two subsequent breeding cycles , 2012, Theoretical and Applied Genetics.

[28]  C. Walthall,et al.  Artificial neural networks for corn and soybean yield prediction , 2005 .

[29]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[30]  J. Reisner,et al.  Biclustering with missing data , 2020, Inf. Sci..

[31]  B. Walsh,et al.  Models for navigating biological complexity in breeding improved crop plants. , 2006, Trends in plant science.

[32]  Saeed Khaki,et al.  Classification of Crop Tolerance to Heat and Drought: A Deep Convolutional Neural Networks Approach , 2019, Agronomy.

[33]  Hieu Pham,et al.  Bagged ensembles with tunable parameters , 2018, Comput. Intell..

[34]  F. Allen,et al.  Using Best Linear Unbiased Predictions to Enhance Breeding for Yield in Soybean: II. Selection of Superior Crosses from a Limited Number of Yield Trials , 1995 .

[35]  Hieu Pham,et al.  On Cesáro Averages for Weighted Trees in the Random Forest , 2019, Journal of Classification.

[36]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[39]  Mohsen Shahhosseini,et al.  Maize yield and nitrate loss prediction with machine learning algorithms , 2019, Environmental Research Letters.

[40]  Richang Hong,et al.  Augmented Collaborative Filtering for Sparseness Reduction in Personalized POI Recommendation , 2017, ACM Trans. Intell. Syst. Technol..

[41]  Yu Liu,et al.  A novel deep hybrid recommender system based on auto-encoder with neural collaborative filtering , 2018, Big Data Min. Anal..