Detecting disease-causing mutations in the human genome by haplotype matching.

Comparisons between haplotypes from affected patients and the human reference genome are frequently used to identify candidates for disease-causing mutations, even though these alignments are expected to reveal a high level of background neutral polymorphism. This limits the scope of genetic studies to relatively small genomic intervals, because current methods for distinguishing potential causal mutations from neutral variation are inefficient. Here we describe a new strategy for detecting mutations that is based on comparing affected haplotypes with closely matched control sequences from healthy individuals, rather than with the human reference genome. We use theory, simulation, and a real data set to show that this approach is expected to reduce the number of sequence variants that must be subjected to follow-up analysis by at least a factor of 20 when closely matched control sequences are selected from a reference panel with as few as 100 control genomes. We also define a reference data resource that would allow efficient application of this strategy to large critical intervals across the genome.

[1]  Peter Donnelly,et al.  A comparison of bayesian methods for haplotype reconstruction from population genotype data. , 2003, American journal of human genetics.

[2]  S. Gabriel,et al.  Calibrating a coalescent simulation of human genome sequence variation. , 2005, Genome research.

[3]  Gabor T. Marth,et al.  The Allele Frequency Spectrum in Genome-Wide Human Variation Data Reveals Signals of Differential Demographic History in Three Large World Populations , 2004, Genetics.

[4]  J. Shendure,et al.  Materials and Methods Som Text Figs. S1 and S2 Tables S1 to S4 References Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome , 2022 .

[5]  W. Li,et al.  Statistical tests of neutrality of mutations. , 1993, Genetics.

[6]  Geoffrey B. Nilsen,et al.  Whole-Genome Patterns of Common DNA Variation in Three Human Populations , 2005, Science.

[7]  R. Hudson,et al.  Interrogating multiple aspects of variation in a full resequencing data set to infer human population size changes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  M. Feldman,et al.  Genetic Structure of Human Populations , 2002, Science.

[9]  P. Donnelly,et al.  A Fine-Scale Map of Recombination Rates and Hotspots Across the Human Genome , 2005, Science.

[10]  P. Donnelly,et al.  The Fine-Scale Structure of Recombination Rate Variation in the Human Genome , 2004, Science.

[11]  Sophie Palmer,et al.  Genetic Analysis of Completely Sequenced Disease-Associated MHC Haplotypes Identifies Shuffling of Segments in Recent Human History , 2006, PLoS genetics.

[12]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[13]  T. Ohta,et al.  The age of a neutral mutant persisting in a finite population. , 1973, Genetics.

[14]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[15]  R. Qiu,et al.  Targeted, haplotype-resolved resequencing of long segments of the human genome. , 2005, Genomics.

[16]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[17]  Sivakumar Gowrisankar,et al.  Pattern of sequence variation across 213 environmental response genes. , 2004, Genome research.

[18]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..