Accurate Genotype Imputation in Multiparental Populations from Low-Coverage Sequence

Many different types of multiparental populations have recently been produced to increase genetic diversity and resolution in QTL mapping. Low-coverage, genotyping-by-sequencing (GBS) technology has become a cost-effective tool in these populations, despite large amounts of missing data in offspring and founders. In this work, we present a general statistical framework for genotype imputation in such experimental crosses from low-coverage GBS data. Generalizing a previously developed hidden Markov model for calculating ancestral origins of offspring DNA, we present an imputation algorithm that does not require parental data and that is applicable to bi- and multiparental populations. Our imputation algorithm allows heterozygosity of parents and offspring as well as error correction in observed genotypes. Further, our approach can combine imputation and genotype calling from sequencing reads, and it also applies to called genotypes from SNP array data. We evaluate our imputation algorithm by simulated and real data sets in four different types of populations: the F2, the advanced intercross recombinant inbred lines, the multiparent advanced generation intercross, and the cross-pollinated population. Because our approach uses marker data and population design information efficiently, the comparisons with previous approaches show that our imputation is accurate at even very low (<1×) sequencing depth, in addition to having accurate genotype phasing and error detection.

[1]  F. V. van Eeuwijk,et al.  Recursive Algorithms for Modeling Genomic Ancestral Origins in a Fixed Pedigree , 2018, G3: Genes, Genomes, Genetics.

[2]  Gregor Gorjanc,et al.  Assessment of the performance of different hidden Markov models for imputation in animal breeding , 2017, bioRxiv.

[3]  M. Scholz,et al.  Comparing performance of modern genotype imputation methods in different ethnicities , 2016, Scientific Reports.

[4]  C. Hackett,et al.  Probabilistic Multilocus Haplotype Reconstruction in Outcrossing Tetraploids , 2016, Genetics.

[5]  Brian L Browning,et al.  Genotype Imputation with Millions of Reference Samples. , 2016, American journal of human genetics.

[6]  Christopher A. Fragoso,et al.  Imputing Genotypes in Biallelic Populations from Low-Coverage Sequence Data , 2015, Genetics.

[7]  S. Myles,et al.  LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms , 2015, G3: Genes, Genomes, Genetics.

[8]  Martin P. Boer,et al.  Reconstruction of Genome Ancestry Blocks in Multiparental Populations , 2015, Genetics.

[9]  Chaozhi Zheng,et al.  Modeling X-Linked Ancestral Origins in Multiparental Populations , 2015, G3: Genes, Genomes, Genetics.

[10]  B. Mathew,et al.  Multi-parent advanced generation inter-cross in barley: high-resolution quantitative trait locus mapping for flowering time as a proof of concept , 2015, Molecular Breeding.

[11]  Hongyu Zhao,et al.  Flexible and scalable genotyping-by-sequencing strategies for population studies , 2014, BMC Genomics.

[12]  J. Enjalbert,et al.  Efficiently Tracking Selection in a Multiparental Population: The Case of Earliness in Wheat , 2014, Genetics.

[13]  Sarah Hearne,et al.  Novel Methods to Optimize Genotypic Imputation for Low‐Coverage, Next‐Generation Sequence Data in Crop Plants , 2014 .

[14]  James Cockram,et al.  An Eight-Parent Multiparent Advanced Generation Inter-Cross Population for Winter-Sown Wheat: Creation, Properties, and Validation , 2014, G3: Genes, Genomes, Genetics.

[15]  F. V. van Eeuwijk,et al.  A General Modeling Framework for Genome Ancestral Origins in Multiparental Populations , 2014, Genetics.

[16]  Riccardo Velasco,et al.  Fast and Cost-Effective Genetic Mapping in Apple Using Next-Generation Sequencing , 2014, G3: Genes, Genomes, Genetics.

[17]  Hei Leung,et al.  Efficient Imputation of Missing Markers in Low-Coverage Genotyping-by-Sequencing Data from Multiparental Crosses , 2014, Genetics.

[18]  Susan McCouch,et al.  Multi-parent advanced generation inter-cross (MAGIC) populations in rice: progress and potential for genetics research and breeding , 2013, Rice.

[19]  E. Wijsman,et al.  GIGI: an approach to effective imputation of dense genotypes on large pedigrees. , 2013, American journal of human genetics.

[20]  Jean-Luc Jannink,et al.  Imputation of Unordered Markers and the Impact on Genomic Selection Accuracy , 2013, G3: Genes, Genomes, Genetics.

[21]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[22]  Robert J. Elshire,et al.  A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species , 2011, PloS one.

[23]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[24]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[25]  Qi Feng,et al.  Parent-independent genotyping for constructing an ultrahigh-density linkage map based on population sequencing , 2010, Proceedings of the National Academy of Sciences.

[26]  Wentian Li,et al.  Two-parameter characterization of chromosome-scale recombination rate. , 2009, Genome research.

[27]  R. Mott,et al.  A Multiparent Advanced Generation Inter-Cross to Fine-Map Quantitative Traits in Arabidopsis thaliana , 2009, PLoS genetics.

[28]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[29]  Hong-Wen Deng,et al.  Analyses and Comparison of Accuracy of Different Genotype Imputation Methods , 2008, PloS one.

[30]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[31]  Håvard Rue,et al.  Recursive computing and simulation-free inference for general factorizable models , 2007 .

[32]  Andrew H. Paterson,et al.  Application of genotyping by sequencing technology to a variety of crop breeding programs. , 2016, Plant science : an international journal of experimental plant biology.

[33]  R. Varshney,et al.  Imputation of Single Nucleotide Polymorphism Genotypes in Biparental, Backcross, and Topcross Populations with a Hidden Markov Model , 2015 .

[34]  Joseph L. Gage,et al.  Bridging the genotyping gap: using genotyping by sequencing (GBS) to add high-density SNP markers and new value to traditional bi-parental mapping and breeding populations , 2013, TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik.

[35]  G. Abecasis,et al.  Merlin—rapid analysis of dense genetic maps using sparse gene flow trees , 2002, Nature Genetics.