Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species

Genomic prediction (GP) is the procedure whereby the genetic merits of untested candidates are predicted using genome wide marker information. Although numerous examples of GP exist in plants and animals, applications to polyploid organisms are still scarce, partly due to limited genome resources and the complexity of this system. Deep learning (DL) techniques comprise a heterogeneous collection of machine learning algorithms that have excelled at many prediction tasks. A potential advantage of DL for GP over standard linear model methods is that DL can potentially take into account all genetic interactions, including dominance and epistasis, which are expected to be of special relevance in most polyploids. In this study, we evaluated the predictive accuracy of linear and DL techniques in two important small fruits or berries: strawberry and blueberry. The two datasets contained a total of 1,358 allopolyploid strawberry (2n=8x=112) and 1,802 autopolyploid blueberry (2n=4x=48) individuals, genotyped for 9,908 and 73,045 single nucleotide polymorphism (SNP) markers, respectively, and phenotyped for five agronomic traits each. DL depends on numerous parameters that influence performance and optimizing hyperparameter values can be a critical step. Here we show that interactions between hyperparameter combinations should be expected and that the number of convolutional filters and regularization in the first layers can have an important effect on model performance. In terms of genomic prediction, we did not find an advantage of DL over linear model methods, except when the epistasis component was important. Linear Bayesian models were better than convolutional neural networks for the full additive architecture, whereas the opposite was observed under strong epistasis. However, by using a parameterization capable of taking into account these non-linear effects, Bayesian linear models can match or exceed the predictive accuracy of DL. A semiautomatic implementation of the DL pipeline is available at https://github.com/lauzingaretti/deepGP/.

[1]  Shengnan Lu,et al.  Performance Analysis of Various Activation Functions in Artificial Neural Networks , 2019, Journal of Physics: Conference Series.

[2]  Vinay Kumar,et al.  Linearized sigmoidal activation: A novel activation function with tractable non-linear characteristics to boost representation capability , 2019, Expert Syst. Appl..

[3]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[4]  Mohammad Najafi,et al.  Deep phenotyping: deep learning for temporal phenotype/genotype classification , 2017, Plant Methods.

[5]  M. Calus,et al.  Genomic Prediction in Animals and Plants: Simulation of Data, Validation, Reporting, and Benchmarking , 2013, Genetics.

[6]  J Crossa,et al.  Genomic prediction in CIMMYT maize and wheat breeding programs , 2013, Heredity.

[7]  R. Roeder,et al.  Eukaryotic gene transcription with purified components. , 1983, Methods in enzymology.

[8]  Luis Varona,et al.  On the Additive and Dominant Variance and Covariance of Individuals Within the Genomic Selection Scope , 2013, Genetics.

[9]  A. Gallais Quantitative genetics and breeding methods in autopolyploid plants , 2004 .

[10]  P. Muñoz,et al.  AGHmatrix: R Package to Construct Relationship Matrices for Autotetraploid and Diploid Species: A Blueberry Example , 2016, The plant genome.

[11]  Philomin Juliana,et al.  A Benchmarking Between Deep Learning, Support Vector Machine and Bayesian Threshold Best Linear Unbiased Prediction for Predicting Ordinal Traits in Plant Breeding , 2018, G3: Genes, Genomes, Genetics.

[12]  M. Causse,et al.  Efficiency of genomic selection for tomato fruit quality , 2016, Molecular Breeding.

[13]  Sameer K. Antani,et al.  Performance evaluation of deep neural ensembles toward malaria parasite detection in thin-blood smear images , 2019, PeerJ.

[14]  Chuang Ma,et al.  DeepGS: Predicting phenotypes from genotypes using Deep Learning , 2017, bioRxiv.

[15]  José Crossa,et al.  Multi-trait, Multi-environment Deep Learning Modeling for Genomic-Enabled Prediction of Plant Traits , 2018, G3: Genes, Genomes, Genetics.

[16]  P. VanRaden,et al.  Efficient methods to compute genomic predictions. , 2008, Journal of dairy science.

[17]  G. de los Campos,et al.  Genome-Wide Regression and Prediction with the BGLR Statistical Package , 2014, Genetics.

[18]  Chinmay Hegde,et al.  Reducing the Search Space for Hyperparameter Optimization Using Group Sparsity , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  L. F. V. Ferrão,et al.  How can a high-quality genome assembly help plant breeders? , 2019, GigaScience.

[20]  Hojjat Salehinejad,et al.  Recurrent Neural Networks for Sequential Phenotype Prediction in Genomics , 2015, 2015 International Conference on Developments of E-Systems Engineering (DeSE).

[21]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[22]  T. Broadbent The Convolution Transform , 1961, Nature.

[23]  R. Fernando,et al.  Genomic-Assisted Prediction of Genetic Value With Semiparametric Procedures , 2006, Genetics.

[24]  Philomin Juliana,et al.  Improving grain yield, stress resilience and quality of bread wheat using large-scale genomics , 2019, Nature Genetics.

[25]  Patrik Waldmann,et al.  Approximate Bayesian neural networks in genomic prediction , 2018, Genetics Selection Evolution.

[26]  D. Gianola,et al.  New Deep Learning Genomic-Based Prediction Model for Multiple Traits with Binary, Ordinal, and Continuous Phenotypes , 2019, G3: Genes, Genomes, Genetics.

[27]  A. Monfort,et al.  pSBVB: A Versatile Simulation Tool To Evaluate Genomic Selection in Polyploid Species , 2018, G3: Genes, Genomes, Genetics.

[28]  J. Lorenzen,et al.  Genomic Prediction in a Multiploid Crop: Genotype by Environment Interaction and Allele Dosage Effects on Predictive Ability in Banana , 2018, The plant genome.

[29]  Tuan-Tu Huynh,et al.  Identification of clathrin proteins by incorporating hyperparameter optimization in deep learning and PSSM profiles , 2019, Comput. Methods Programs Biomed..

[30]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[31]  José Crossa,et al.  Predicting Quantitative Traits With Regression Models for Dense Molecular Markers and Pedigree , 2009, Genetics.

[32]  L. F. V. Ferrão,et al.  Genomic Prediction of Autotetraploids; Influence of Relationship Matrices, Allele Dosage, and Continuous Genotyping Calls in Phenotype Prediction , 2018, G3: Genes, Genomes, Genetics.

[33]  Litao Yang,et al.  Plant Metabolomics: An Indispensable System Biology Tool for Plant Science , 2016, International journal of molecular sciences.

[34]  Daniel Gianola,et al.  Machine learning methods and predictive ability metrics for genome-wide prediction of complex traits $ , 2014 .

[35]  Henner Simianer,et al.  Accounting for Genetic Architecture Improves Sequence Based Genomic Prediction for a Drosophila Fitness Trait , 2015, PloS one.

[36]  H. Scharr,et al.  HyperART: non-invasive quantification of leaf traits using hyperspectral absorption-reflectance-transmittance imaging , 2015, Plant Methods.

[37]  José Crossa,et al.  Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. , 2010, Genetics research.

[38]  H. Castillo‐Juárez,et al.  Genetic improvement of Pacific white shrimp [Penaeus (Litopenaeus) vannamei]: perspectives for genomic selection , 2015, Front. Genet..

[39]  Malia A. Gehan,et al.  Lights, camera, action: high-throughput plant phenotyping is ready for a close-up. , 2015, Current opinion in plant biology.

[40]  D. Gianola Priors in Whole-Genome Regression: The Bayesian Alphabet Returns , 2013, Genetics.

[41]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Vincent Colot,et al.  Understanding mechanisms of novel gene expression in polyploids. , 2003, Trends in genetics : TIG.

[43]  Impact of Dominance Effects on Autotetraploid Genomic Prediction , 2019, Crop Science.

[44]  D. Llewellyn,et al.  Historical Datasets Support Genomic Selection Models for the Prediction of Cotton Fiber Quality Phenotypes Across Multiple Environments , 2018, G3: Genes, Genomes, Genetics.

[45]  D Gianola,et al.  Reproducing kernel Hilbert spaces regression: a general framework for genetic evaluation. , 2009, Journal of animal science.

[46]  Santanu Pattanayak Unsupervised Learning with Restricted Boltzmann Machines and Auto-encoders , 2017 .

[47]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[48]  G. de los Campos,et al.  Can Deep Learning Improve Genomic Prediction of Complex Human Traits? , 2018, Genetics.

[49]  Youngjun Yoo,et al.  Hyperparameter optimization of deep neural network using univariate dynamic encoding algorithm for searches , 2019, Knowl. Based Syst..

[50]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[51]  Steven R. Young,et al.  Optimizing deep learning hyper-parameters through an evolutionary algorithm , 2015, MLHPC@SC.

[52]  Alejandra N. González-Beltrán,et al.  PhenoMeNal: processing and analysis of metabolomics data in the cloud , 2018, bioRxiv.

[53]  Mark Roantree,et al.  Benchmarking algorithms for genomic prediction of complex traits , 2019, bioRxiv.

[54]  Tad S Sonstegard,et al.  Genomic Selection in Dairy Cattle: The USDA Experience. , 2017, Annual review of animal biosciences.

[55]  L. F. V. Ferrão,et al.  Insights Into the Genetic Basis of Blueberry Fruit-Related Traits Using Diploid and Polyploid Models in a GWAS Context , 2018, Front. Ecol. Evol..

[56]  Cem Direkoglu,et al.  Review of MRI-based Brain Tumor Image Segmentation Using Deep Learning Methods , 2016 .

[57]  T. Greiner,et al.  Strawberries , 2011, And She Was.

[58]  V. Whitaker,et al.  An experimental validation of genomic selection in octoploid strawberry , 2017, Horticulture Research.

[59]  D. Gianola,et al.  Reproducing Kernel Hilbert Spaces Regression Methods for Genomic Assisted Prediction of Quantitative Traits , 2008, Genetics.

[60]  M. Goddard,et al.  Accelerating improvement of livestock with genomic selection. , 2013, Annual review of animal biosciences.

[61]  B. Frey,et al.  The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[62]  R. Bernardo Molecular Markers and Selection for Complex Traits in Plants: Learning from the Last 20 Years , 2008 .

[63]  Luca Bianco,et al.  Development and preliminary evaluation of a 90 K Axiom® SNP array for the allo-octoploid cultivated strawberry Fragaria × ananassa , 2015, BMC Genomics.

[64]  M. Sorrells,et al.  A Low Resolution Epistasis Mapping Approach To Identify Chromosome Arm Interactions in Allohexaploid Wheat , 2018, G3: Genes, Genomes, Genetics.

[65]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[66]  A. Greenberg,et al.  Next-generation phenotyping: requirements and strategies for enhancing our understanding of genotype–phenotype relationships and its relevance to crop improvement , 2013, Theoretical and Applied Genetics.

[67]  M. Calus,et al.  Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding , 2013, Genetics.

[68]  José Crossa,et al.  Genomic Prediction in Maize Breeding Populations with Genotyping-by-Sequencing , 2013, G3: Genes, Genomes, Genetics.

[69]  R. Mwanga,et al.  Selection methods. Part 5: Breeding clonally propagated crops , 2009 .

[70]  Xiaolin Hu,et al.  Recurrent convolutional neural network for object recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  J. E. Cairns,et al.  Genome-enabled prediction of genetic values using radial basis function neural networks , 2012, Theoretical and Applied Genetics.

[72]  G. de los Campos,et al.  Genomic Selection for Late Blight and Common Scab Resistance in Tetraploid Potato (Solanum tuberosum) , 2018, G3: Genes, Genomes, Genetics.

[73]  Matthew Stephens,et al.  Genotyping Polyploids from Messy Sequencing Data , 2018, Genetics.

[74]  B. Hayes,et al.  Improving Genetic Gain with Genomic Selection in Autotetraploid Potato , 2016, The plant genome.

[75]  Jason Thornton,et al.  Learning Network Architectures of Deep CNNs Under Resource Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[76]  T. Mackay Q&A: Genetic analysis of quantitative traits , 2009, Journal of biology.

[77]  Guilherme S. Pereira,et al.  Genomic Selection with Allele Dosage in Panicum maximum Jacq. , 2019, G3: Genes, Genomes, Genetics.

[78]  Thomas Brox,et al.  Training Deformable Object Models for Human Detection Based on Alignment and Clustering , 2014, ECCV.

[79]  R. Visser,et al.  Tools for Genetic Studies in Experimental Populations of Polyploids , 2018, Front. Plant Sci..

[80]  Miguel Pérez-Enciso,et al.  A Guide on Deep Learning for Complex Trait Genomic Prediction , 2019, Genes.

[81]  G. C. Yencho,et al.  Genetic Variance Partitioning and Genome-Wide Prediction with Allele Dosage Information in Autotetraploid Potato , 2018, Genetics.

[82]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[83]  D. Gianola,et al.  Multi-environment Genomic Prediction of Plant Traits Using Deep Learners With Dense Architecture , 2018, G3: Genes, Genomes, Genetics.

[84]  Jennifer H. Wisecaver,et al.  Haplotype-phased genome and evolution of phytonutrient pathways of tetraploid blueberry , 2019, GigaScience.