DeNovoGear: de novo indel and point mutation discovery and phasing

We present DeNovoGear software for analyzing de novo mutations from familial and somatic tissue sequencing data. DeNovoGear uses likelihood-based error modeling to reduce the false positive rate of mutation discovery in exome analysis and fragment information to identify the parental origin of germ-line mutations. We used DeNovoGear on human whole-genome sequencing data to produce a set of predicted de novo insertion and/or deletion (indel) mutations with a 95% validation rate.

[1]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[2]  Wei Chen,et al.  A Likelihood-Based Framework for Variant Calling and De Novo Mutation Detection in Families , 2012, PLoS genetics.

[3]  W. J. Dickinson,et al.  A genome-wide view of the spectrum of spontaneous mutations in yeast , 2008, Proceedings of the National Academy of Sciences.

[4]  D. M. Smith Algorithm AS 189: Maximum Likelihood Estimation of the Parameters of the Beta Binomial Distribution , 1983 .

[5]  M. Lynch Rate, molecular spectrum, and consequences of human mutation , 2010, Proceedings of the National Academy of Sciences.

[6]  M. DePristo,et al.  Variation in genome-wide mutation rates within and between human families , 2011, Nature Genetics.

[7]  P. Shannon,et al.  Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing , 2010, Science.

[8]  R. Durbin,et al.  Dindel: accurate indel calls from short-read data. , 2011, Genome research.

[9]  G. A. Watterson On the number of segregating sites in genetical models without recombination. , 1975, Theoretical population biology.

[10]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[11]  Sebastian Bauer,et al.  The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process , 2011, Nucleic acids research.

[12]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[13]  Kateryna D. Makova,et al.  A Macaque's-Eye View of Human Insertions and Deletions: Differences in Mechanisms , 2007, PLoS Comput. Biol..

[14]  Gerton Lunter,et al.  Probabilistic whole-genome alignments reveal high indel rates in the human and mouse genomes , 2007, ISMB/ECCB.

[15]  Reed A. Cartwright,et al.  A Family-Based Probabilistic Method for Capturing De Novo Mutations from High-Throughput Short-Read Sequencing Data , 2012, Statistical applications in genetics and molecular biology.

[16]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[17]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.