Needles: Toward Large-Scale Genomic Prediction with Marker-by-Environment Interaction

Genomic prediction relies on genotypic marker information to predict the agronomic performance of future hybrid breeds based on trial records. Because the effect of markers may vary substantially under the influence of different environmental conditions, marker-by-environment interaction effects have to be taken into account. However, this may lead to a dramatic increase in the computational resources needed for analyzing large-scale trial data. A high-performance computing solution, called Needles, is presented for handling such data sets. Needles is tailored to the particular properties of the underlying algebraic framework by exploiting a sparse matrix formalism where suited and by utilizing distributed computing techniques to enable the use of a dedicated computing cluster. It is demonstrated that large-scale analyses can be performed within reasonable time frames with this framework. Moreover, by analyzing simulated trial data, it is shown that the effects of markers with a high environmental interaction can be predicted more accurately when more records per environment are available in the training data. The availability of such data and their analysis with Needles also may lead to the discovery of highly contributing QTL in specific environmental conditions. Such a framework thus opens the path for plant breeders to select crops based on these QTL, resulting in hybrid lines with optimized agronomic performance in specific environmental conditions.

[1]  พงศ์ศักดิ์ บินสมประสงค์,et al.  FORMATION OF A SPARSE BUS IMPEDANCE MATRIX AND ITS APPLICATION TO SHORT CIRCUIT STUDY , 1980 .

[2]  José Crossa,et al.  Genomic Prediction of Breeding Values when Modeling Genotype × Environment Interaction using Pedigree and Dense Molecular Markers , 2012 .

[3]  Walter T. Federer,et al.  On Augmented Designs , 1975 .

[4]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[5]  H. D. Patterson,et al.  Recovery of inter-block information when block sizes are unequal , 1971 .

[6]  H. Piepho Empirical best linear unbiased prediction in cultivar trials using factor-analytic variance-covariance structures , 1998, Theoretical and Applied Genetics.

[7]  Gary K. Chen,et al.  Fast and flexible simulation of DNA sequence data. , 2008, Genome research.

[8]  Graeme L. Hammer,et al.  The GP problem: Quantifying gene-to-phenotype relationships , 2002, Silico Biol..

[9]  Dean W. Podlich,et al.  Mapping As You Go , 2004 .

[10]  F. V. van Eeuwijk,et al.  A Mixed-Model Quantitative Trait Loci (QTL) Analysis for Multiple-Environment Trial Data Using Environmental Covariables for QTL-by-Environment Interactions, With an Example in Maize , 2007, Genetics.

[11]  Hans-Peter Piepho,et al.  Modelling expectation and variance for genotype by environment data , 1997, Heredity.

[12]  R. Bernardo,et al.  Prospects for genomewide selection for quantitative traits in maize , 2007 .

[13]  A. Melchinger,et al.  Quantitative Trait Locus Mapping Based on Resampling in a Vast Maize Testcross Experiment and Its Relevance to Quantitative Genetics for Complex Traits , 2004, Genetics.

[14]  Deniz Akdemir,et al.  Integrating environmental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions , 2013, Theoretical and Applied Genetics.

[15]  Hsiao-Pei Yang,et al.  Genomic Selection in Plant Breeding: A Comparison of Models , 2012 .

[16]  Marco Lopez-Cruz,et al.  Increased Prediction Accuracy in Wheat Breeding Trials Using a Marker × Environment Interaction Genomic Selection Model , 2015, G3: Genes, Genomes, Genetics.

[17]  Hans-Peter Piepho,et al.  Genomic selection allowing for marker‐by‐environment interaction , 2013 .

[18]  A. Charcosset,et al.  Use of trial clustering to study QTL × environment effects for grain yield and related traits in maize , 2004, Theoretical and Applied Genetics.

[19]  Pre-selection of markers for genomic selection , 2011, BMC proceedings.

[20]  M. Ganal,et al.  Large SNP arrays for genotyping in crop plants , 2012, Journal of Biosciences.

[21]  Mark E. Cooper,et al.  Gene-to-phenotype models and complex trait genetics , 2005 .

[22]  Paul C. Struik,et al.  Statistical models for genotype by environment data: from conventional ANOVA models to eco-physiological QTL models , 2005 .

[23]  J Crossa,et al.  Genomic prediction in CIMMYT maize and wheat breeding programs , 2013, Heredity.

[24]  Bevan Emma Huang,et al.  AlphaMPSim: flexible simulation of multi-parent crosses , 2014, Bioinform..

[25]  I Misztal,et al.  Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. , 2010, Journal of dairy science.

[26]  Robin Thompson,et al.  Average information REML: An efficient algorithm for variance parameter estimation in linear mixed models , 1995 .

[27]  M. Goddard,et al.  Invited review: Genomic selection in dairy cattle: progress and challenges. , 2009, Journal of dairy science.

[28]  Jaeyoung Choi,et al.  A Proposal for a Set of Parallel Basic Linear Algebra Subprograms , 1995, PARA.

[29]  H. Mulder,et al.  Effects of genotype x environment interaction on genetic gain in breeding programs. , 2005, Journal of animal science.

[30]  Rudolf A. Römer,et al.  On Large-Scale Diagonalization Techniques for the Anderson Model of Localization , 2006, SIAM J. Sci. Comput..

[31]  M. Lund,et al.  Genomic prediction when some animals are not genotyped , 2010, Genetics Selection Evolution.

[32]  C. Shindo,et al.  Segregation analysis of heading traits in hexaploid wheat utilizing recombinant inbred lines , 2003, Heredity.

[33]  José Crossa,et al.  Prediction of Genetic Values of Quantitative Traits in Plant Breeding Using Pedigree and Molecular Markers , 2010, Genetics.

[34]  Xiao-Lin Wu,et al.  Modeling relationships between calving traits: a comparison between standard and recursive mixed models , 2010, Genetics Selection Evolution.

[35]  S. König,et al.  Genetic relationships for dairy performance between large-scale and small-scale farm conditions. , 2005, Journal of dairy science.

[36]  Joshua A. Udall,et al.  Breeding for Quantitative Traits in Plants , 2003 .

[37]  R. Fernando,et al.  The Impact of Genetic Relationship Information on Genome-Assisted Breeding Values , 2007, Genetics.

[38]  Jack Dongarra,et al.  ScaLAPACK Users' Guide , 1987 .

[39]  Olaf Schenk,et al.  Fast Methods for Computing Selected Elements of the Green's Function in Massively Parallel Nanoelectronic Device Simulations , 2013, Euro-Par.

[40]  Jan Fostier,et al.  DAIRRy-BLUP: A High-Performance Computing Approach to Genomic Prediction , 2014, Genetics.

[41]  Olaf Schenk,et al.  Matching-based preprocessing algorithms to the solution of saddle-point problems in large-scale nonconvex interior-point optimization , 2007, Comput. Optim. Appl..

[42]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[43]  José Crossa,et al.  A reaction norm model for genomic selection using high-dimensional genomic and environmental data , 2013, Theoretical and Applied Genetics.

[44]  M. Calus,et al.  Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding , 2013, Genetics.

[45]  Hans-Peter Piepho,et al.  Comparisons of single-stage and two-stage approaches to genomic selection , 2012, Theoretical and Applied Genetics.

[46]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[47]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[48]  C. R. Henderson SIRE EVALUATION AND GENETIC TRENDS , 1973 .

[49]  H. Piepho Statistical tests for QTL and QTL-by-environment effects in segregating populations derived from line crosses , 2005, Theoretical and Applied Genetics.

[50]  R. Lande,et al.  Efficiency of marker-assisted selection in the improvement of quantitative traits. , 1990, Genetics.

[51]  P. VanRaden,et al.  Invited review: reliability of genomic predictions for North American Holstein bulls. , 2009, Journal of dairy science.