Predicting Adaptive Phenotypes From Multilocus Genotypes in Sitka Spruce (Picea sitchensis) Using Random Forest

Climate is the primary driver of the distribution of tree species worldwide, and the potential for adaptive evolution will be an important factor determining the response of forests to anthropogenic climate change. Although association mapping has the potential to improve our understanding of the genomic underpinnings of climatically relevant traits, the utility of adaptive polymorphisms uncovered by such studies would be greatly enhanced by the development of integrated models that account for the phenotypic effects of multiple single-nucleotide polymorphisms (SNPs) and their interactions simultaneously. We previously reported the results of association mapping in the widespread conifer Sitka spruce (Picea sitchensis). In the current study we used the recursive partitioning algorithm ‘Random Forest’ to identify optimized combinations of SNPs to predict adaptive phenotypes. After adjusting for population structure, we were able to explain 37% and 30% of the phenotypic variation, respectively, in two locally adaptive traits—autumn budset timing and cold hardiness. For each trait, the leading five SNPs captured much of the phenotypic variation. To determine the role of epistasis in shaping these phenotypes, we also used a novel approach to quantify the strength and direction of pairwise interactions between SNPs and found such interactions to be common. Our results demonstrate the power of Random Forest to identify subsets of markers that are most important to climatic adaptation, and suggest that interactions among these loci may be widespread.

[1]  D. Neale,et al.  Association Genetics in Pinus taeda L. I. Wood Property Traits , 2007, Genetics.

[2]  David B Neale,et al.  Association Genetics of Coastal Douglas Fir (Pseudotsuga menziesii var. menziesii, Pinaceae). I. Cold-Hardiness Related Traits , 2009, Genetics.

[3]  Mark F. Davis,et al.  Association genetics of traits controlling lignin and cellulose biosynthesis in black cottonwood (Populus trichocarpa, Salicaceae) secondary xylem. , 2010, The New phytologist.

[4]  D. Mosier,et al.  Fitness Epistasis and Constraints on Adaptation in a Human Immunodeficiency Virus Type 1 Protein Region , 2010, Genetics.

[5]  M. Wade,et al.  Alternative definitions of epistasis: dependence and interaction , 2001 .

[6]  R. Latta Differentiation of Allelic Frequencies at Quantitative Trait Loci Affecting Locally Adaptive Traits , 1998, The American Naturalist.

[7]  V. Le Corre,et al.  Genetic variability at neutral markers, quantitative trait land trait in a subdivided population under selection. , 2003, Genetics.

[8]  D. Neale,et al.  Multilocus Patterns of Nucleotide Diversity and Divergence Reveal Positive Selection at Candidate Genes Related to Cold Hardiness in Coastal Douglas Fir (Pseudotsuga menziesii var. menziesii) , 2009, Genetics.

[9]  Jason H. Moore,et al.  Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions , 2003, Bioinform..

[10]  Jean-Luc Jannink,et al.  Genomic selection in plant breeding: from theory to practice. , 2010, Briefings in functional genomics.

[11]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[12]  M. Pavlicev,et al.  Evolution of pleiotropy: epistatic interaction pattern supports a mechanistic model underlying variation in genotype-phenotype map. , 2011, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[13]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[14]  David J. Lunn,et al.  A Bayesian toolkit for genetic association studies , 2006, Genetic epidemiology.

[15]  H. A. Orr,et al.  THE POPULATION GENETICS OF ADAPTATION: THE DISTRIBUTION OF FACTORS FIXED DURING ADAPTIVE EVOLUTION , 1998, Evolution; international journal of organic evolution.

[16]  D. Neale,et al.  Association genetics of complex traits in conifers. , 2004, Trends in plant science.

[17]  T. Knürr,et al.  Gene Flow and Local Adaptation in Trees , 2007 .

[18]  M. R. Ahuja,et al.  Evolution of Genome Size in Conifers , 2005 .

[19]  D. Schluter,et al.  Adaptation from standing genetic variation. , 2008, Trends in ecology & evolution.

[20]  K. Lunetta,et al.  Screening large-scale association study data: exploiting interactions using random forests , 2004, BMC Genetics.

[21]  B. Ziegenhagen,et al.  Evolution of Genome Size in Conifers , 2005 .

[22]  Garth R. Brown,et al.  DNA Sequence Variation and Selection of Tag Single-Nucleotide Polymorphisms at Candidate Genes for Drought-Stress Response in Pinus taeda L. , 2006, Genetics.

[23]  S. Aitken,et al.  Effects of genetic selection for growth on frost hardiness in western hemlock , 1999 .

[24]  P. Ingvarsson,et al.  Genetic Differentiation, Clinal Variation and Phenotypic Associations With Growth Cessation Across the Populus tremula Photoperiodic Pathway , 2010, Genetics.

[25]  D. Grattapaglia,et al.  Genomic selection in forest tree breeding , 2011, Tree Genetics & Genomes.

[26]  P. Ingvarsson,et al.  Nucleotide Polymorphism and Phenotypic Associations Within and Around the phytochrome B2 Locus in European Aspen (Populus tremula, Salicaceae) , 2008, Genetics.

[27]  E. Ortlund,et al.  Crystal Structure of an Ancient Protein: Evolution by Conformational Epistasis , 2007, Science.

[28]  F B Christiansen,et al.  Evolution of recombination in a constant environment. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[29]  S. Aitken,et al.  Adaptive gradients and isolation-by-distance with postglacial migration in Picea sitchensis , 2007, Heredity.

[30]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[32]  M. McMullen,et al.  A unified mixed-model method for association mapping that accounts for multiple levels of relatedness , 2006, Nature Genetics.

[33]  D. Neale,et al.  Association genetics in Pinus taeda L. II. Carbon isotope discrimination , 2008, Heredity.

[34]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[35]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[36]  H. A. Orr,et al.  The evolutionary genetics of adaptation: a simulation study. , 1999, Genetical research.

[37]  Jun S. Liu,et al.  Bayesian inference of epistatic interactions in case-control studies , 2007, Nature Genetics.

[38]  Adele Cutler,et al.  An application of Random Forests to a genome-wide association dataset: Methodological considerations & new findings , 2010, BMC Genetics.

[39]  Kerrie L. Mengersen,et al.  Methods for Identifying SNP Interactions: A Review on Variations of Logic Regression, Random Forest and Bayesian Logistic Regression , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[40]  Tongli Wang,et al.  Potential effects of climate change on ecosystem and tree species distribution in British Columbia. , 2006, Ecology.

[41]  Rui Jiang,et al.  A random forest approach to the detection of epistatic interactions in case-control studies , 2009, BMC Bioinformatics.

[42]  P. Phillips Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems , 2008, Nature Reviews Genetics.

[43]  D. Neale,et al.  Nucleotide Diversity and Linkage Disequilibrium in Cold-Hardiness- and Wood Quality-Related Candidate Genes in Douglas Fir , 2005, Genetics.

[44]  G. Casella,et al.  Association Mapping of Quantitative Disease Resistance in a Natural Population of Loblolly Pine (Pinus taeda L.) , 2010, Genetics.

[45]  James S. Clark,et al.  MOLECULAR INDICATORS OF TREE MIGRATION CAPACITY UNDER RAPID CLIMATE CHANGE , 2005 .

[46]  D. Grattapaglia,et al.  Accelerating the domestication of trees using genomic selection: accuracy of prediction models across ages and environments. , 2012, The New phytologist.

[47]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[48]  D. Neale,et al.  Forest-tree population genomics and adaptive evolution. , 2006, The New phytologist.

[49]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[50]  Zhiwu Zhang,et al.  Association Mapping: Critical Considerations Shift from Genotyping to Experimental Design , 2009, The Plant Cell Online.

[51]  M W Feldman,et al.  Deleterious mutations, variable epistatic interactions, and the evolution of recombination. , 1997, Theoretical population biology.

[52]  F. Rousset genepop’007: a complete re‐implementation of the genepop software for Windows and Linux , 2008, Molecular ecology resources.

[53]  J. Léon,et al.  Detection of epistatic interactions between exotic alleles introgressed from wild barley (H. vulgare ssp. spontaneum) , 2010, Theoretical and Applied Genetics.

[54]  K. Lunetta,et al.  Identifying SNPs predictive of phenotype using random forests , 2005, Genetic epidemiology.

[55]  V. Le Corre,et al.  The genetic differentiation at quantitative trait loci under local adaptation , 2012, Molecular ecology.

[56]  R. Neilson,et al.  Estimated migration rates under scenarios of global climate change , 2002 .

[57]  S. Yeaman,et al.  Adaptation, migration or extirpation: climate change outcomes for tree populations , 2008, Evolutionary applications.

[58]  R. Petit,et al.  Some Evolutionary Consequences of Being a Tree , 2006 .

[59]  A. Kremer,et al.  Decoupling of differentiation between traits and their underlying genes in response to divergent selection , 2011, Heredity.

[60]  Jonathan D. G. Jones,et al.  Genome-wide survey of Arabidopsis natural variation in downy mildew resistance using combined association and linkage mapping , 2010, Proceedings of the National Academy of Sciences.

[61]  Kermit Ritland,et al.  Widespread, ecologically relevant genetic markers developed from association mapping of climate-related traits in Sitka spruce (Picea sitchensis). , 2010, The New phytologist.

[62]  O. Fiehn,et al.  Association genetics of the loblolly pine (Pinus taeda, Pinaceae) metabolome. , 2012, The New phytologist.

[63]  Russell L. Malmberg,et al.  Epistasis for Fitness-Related Quantitative Traits in Arabidopsis thaliana Grown in the Field and in the Greenhouse , 2005, Genetics.