Estimating IBD tracts from low coverage NGS data

MOTIVATION The amount of IBD in an individual depends on the relatedness of the individual's parents. However, it can also provide information regarding mating system, past history and effective size of the population from which the individual has been sampled. RESULTS Here, we present a new method for estimating inbreeding IBD tracts from low coverage NGS data. Contrary to other methods that use genotype data, the one presented here uses genotype likelihoods to take the uncertainty of the data into account. We benchmark it under a wide range of biologically relevant conditions and show that the new method provides a marked increase in accuracy even at low coverage. AVAILABILITY AND IMPLEMENTATION The methods presented in this work were implemented in C/C ++ and are freely available for non-commercial use from https://github.com/fgvieira/ngsF-HMM CONTACT: fgvieira@snm.ku.dk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Jun Wang,et al.  The Power of Inbreeding: NGS-Based GWAS of Rice Reveals Convergent Evolution during Rice Domestication. , 2016, Molecular plant.

[2]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[3]  Joshua S. Paul,et al.  Genotype and SNP calling from next-generation sequencing data , 2011, Nature Reviews Genetics.

[4]  J. Weber,et al.  Estimating Human Inbreeding Coefficients: Comparison of Genealogical and Marker Heterozygosity Approaches , 2006, Annals of human genetics.

[5]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[6]  Matthew D. Shirley,et al.  Unexpected Relationships and Inbreeding in HapMap Phase III Populations , 2012, PloS one.

[7]  D. H. Reed,et al.  Realistic levels of inbreeding depression strongly affect extinction risk in wild populations , 2006 .

[8]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[9]  Marie-Claude Babron,et al.  High level of inbreeding in final phase of 1000 Genomes Project , 2015, Scientific Reports.

[10]  Michael A. Schmidt,et al.  SeqEM: an adaptive genotype-calling approach for next-generation sequencing studies , 2010, Bioinform..

[11]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[12]  R. Ishikawa,et al.  Estimation of the outcrossing rate for annual Asian wild rice under field conditions , 2012, Breeding science.

[13]  Jun Wang,et al.  SNP Calling, Genotype Calling, and Sample Allele Frequency Estimation from New-Generation Sequencing Data , 2012, PloS one.

[14]  Y. Sano,et al.  Differentiation of perennial and annual types due to habitat conditions in the wild riceOryza perennis , 1984, Plant Systematics and Evolution.

[15]  G. Luikart,et al.  Measuring individual inbreeding in the age of genomics: marker-based measures are better than pedigrees , 2015, Heredity.

[16]  R. Nielsen,et al.  Estimating inbreeding coefficients from NGS data: Impact on genotype calling and allele frequency estimation , 2013, Genome research.

[17]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[18]  N. Lennon,et al.  Characterizing and measuring bias in sequence data , 2013, Genome Biology.

[19]  Qingwen Yang,et al.  Domestication and geographic origin of Oryza sativa in China: insights from multilocus analysis of nucleotide variation of O. sativa and O. rufipogon , 2012, Molecular ecology.

[20]  L. Tender,et al.  A Selective Advantage to Immigrant Genes in a Daphnia Metapopulation , 2002 .

[21]  Bernard Prum,et al.  Estimation of the inbreeding coefficient through use of genomic data. , 2003, American journal of human genetics.

[22]  A. Boyko,et al.  Linkage Disequilibrium and Demographic History of Wild and Domestic Canids , 2009, Genetics.

[23]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[24]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[25]  N. Morton,et al.  Extended tracts of homozygosity in outbred human populations. , 2006, Human molecular genetics.

[26]  Sewall Wright,et al.  Coefficients of Inbreeding and Relationship , 1922, The American Naturalist.

[27]  S. Mccouch,et al.  New insights into the history of rice domestication. , 2007, Trends in genetics : TIG.

[28]  T. Chapman,et al.  Extremely high levels of inbreeding in a natural population of the free-living wasp Ancistrocerus antilope (Hymenoptera: Vespidae: Eumeninae) , 1996, Heredity.

[29]  Deborah Charlesworth,et al.  Effects of inbreeding on the genetic diversity of populations. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[30]  L. Mercer,et al.  Maximum likelihood estimation of individual inbreeding coefficients and null allele frequencies. , 2012, Genetics research.

[31]  B. Schaal,et al.  Assessment of population genetic structure in common wild rice Oryza rufipogon Griff. using microsatellite and allozyme markers , 2002, Theoretical and Applied Genetics.

[32]  T. Glenn Field guide to next‐generation DNA sequencers , 2011, Molecular ecology resources.

[33]  Anders Albrechtsen,et al.  ANGSD: Analysis of Next Generation Sequencing Data , 2014, BMC Bioinformatics.

[34]  T. Sang,et al.  Genetic Architecture for the Adaptive Origin of Annual Wild Rice, Oryza nivara , 2009, Evolution; international journal of organic evolution.

[35]  E. Kirkness,et al.  The Dog Genome: Survey Sequencing and Comparative Analysis , 2003, Science.

[36]  Yingrui Li,et al.  Estimation of allele frequency and association mapping using next-generation sequencing data , 2011, BMC Bioinformatics.

[37]  Huanming Yang,et al.  SNP detection for massively parallel whole-genome resequencing. , 2009, Genome research.

[38]  Scott Ferson,et al.  Accounting for uncertainty in DNA sequencing data. , 2015, Trends in genetics : TIG.

[39]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[40]  Cedric E. Ginestet ggplot2: Elegant Graphics for Data Analysis , 2011 .

[41]  V. Grant,et al.  Origin of Cultivated Rice , 1988 .

[42]  R. Nielsen,et al.  Quantifying Population Genetic Differentiation from Next-Generation Sequencing Data , 2013, Genetics.