Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing

Despite their importance in gene innovation and phenotypic variation, duplicated regions have remained largely intractable owing to difficulties in accurately resolving their structure, copy number and sequence content. We present an algorithm (mrFAST) to comprehensively map next-generation sequence reads, which allows for the prediction of absolute copy-number variation of duplicated segments and genes. We examine three human genomes and experimentally validate genome-wide copy number differences. We estimate that, on average, 73–87 genes vary in copy number between any two individuals and find that these genic differences overwhelmingly correspond to segmental duplications (odds ratio = 135; P < 2.2 × 10−16). Our method can distinguish between different copies of highly identical genes, providing a more accurate assessment of gene content and insight into functional constraint without the limitations of array-based technology.

[1]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[2]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[3]  Patrick A. V. Hall,et al.  Approximate String Matching , 1994, Encyclopedia of Algorithms.

[4]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[5]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[6]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[7]  G Hermanson,et al.  High-resolution mapping of human chromosome 11 by in situ hybridization with cosmid clones. , 1990, Science.

[8]  H. Hobbs,et al.  Molecular definition of the extreme size polymorphism in apolipoprotein(a). , 1993, Human molecular genetics.

[9]  D G Higgins,et al.  CLUSTAL V: multiple alignment of DNA and protein sequences. , 1994, Methods in molecular biology.

[10]  Andrzej Wozniak,et al.  Using video-oriented instructions to speed up sequence comparison , 1997, Comput. Appl. Biosci..

[11]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[12]  Torbjørn Rognes,et al.  Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors , 2000, Bioinform..

[13]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.

[14]  B. Trask,et al.  Segmental duplications: organization and impact within the current human genome project assembly. , 2001, Genome research.

[15]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[16]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[17]  M. Adams,et al.  Recent Segmental Duplications in the Human Genome , 2002, Science.

[18]  Xavier Estivill,et al.  Chromosomal regions containing high-density and ambiguously mapped putative single nucleotide polymorphisms (SNPs) correlate with segmental duplications in the human genome. , 2002, Human molecular genetics.

[19]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[20]  Daniel Pinkel,et al.  Large-scale variation among human and great ape genomes determined by array comparative genomic hybridization. , 2003, Genome research.

[21]  Randall A. Bolanos,et al.  Whole-genome shotgun assembly and comparison of human genome assemblies , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[22]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[23]  E. Eichler,et al.  Shotgun sequence assembly and recent segmental duplications within the human genome , 2004, Nature.

[24]  E. Eichler,et al.  Segmental duplications and copy-number variation in the human genome. , 2005, American journal of human genetics.

[25]  E. Eichler,et al.  Fine-scale structural variation of the human genome , 2005, Nature Genetics.

[26]  B. Rovin,et al.  The Influence of CCL 3 L 1 Gene – Containing Segmental Duplications on HIV-1 / AIDS Susceptibility , 2009 .

[27]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[28]  Yang Liu,et al.  GPU Accelerated Smith-Waterman , 2006, International Conference on Computational Science.

[29]  Alejandro A. Schäffer,et al.  WindowMasker: window-based masker for sequenced genomes , 2006, Bioinform..

[30]  Bernhard Radlwimmer,et al.  A chromosome 8 gene-cluster polymorphism with low human beta-defensin 2 gene copy number predisposes to Crohn disease of the colon. , 2006, American journal of human genetics.

[31]  Enrico Petretto,et al.  Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans , 2006, Nature.

[32]  E. Eichler,et al.  Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. , 2006, American journal of human genetics.

[33]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[34]  Bi Zhou,et al.  Gene copy-number variation and associated polymorphisms of complement component C4 in human systemic lupus erythematosus (SLE): low copy number is a risk factor for and high copy number is a protective factor against SLE susceptibility in European Americans. , 2007, American journal of human genetics.

[35]  Philippe Froguel,et al.  FCGR3B copy number variation is associated with susceptibility to systemic, but not organ-specific, autoimmunity , 2007, Nature Genetics.

[36]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[37]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[38]  E. Eichler,et al.  Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution , 2007, Nature Genetics.

[39]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[40]  Gabor T. Marth,et al.  Rapid whole-genome mutational profiling using next-generation sequencing technologies. , 2008, Genome research.

[41]  Joshua M. Korn,et al.  Integrated detection and population-genetic analysis of SNPs and copy number variation , 2008, Nature Genetics.

[42]  André Reis,et al.  Psoriasis is associated with increased β-defensin genomic copy number , 2008, Nature Genetics.

[43]  Antony V. Cox,et al.  Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing , 2008, Nature Genetics.

[44]  Gabor T. Marth,et al.  Whole-genome sequencing and variant discovery in C. elegans , 2008, Nature Methods.

[45]  Ruiqiang Li,et al.  SOAP: short oligonucleotide alignment program , 2008, Bioinform..

[46]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[47]  Vladimir Yanovski,et al.  Read Mapping Algorithms for Single Molecule Sequencing Data , 2008, WABI.

[48]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[49]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[50]  E. Eichler,et al.  Systematic assessment of copy number variant detection via genome-wide SNP genotyping , 2008, Nature Genetics.

[51]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.

[52]  André Reis,et al.  Psoriasis is associated with increased beta-defensin genomic copy number. , 2008, Nature genetics.

[53]  Arcadi Navarro,et al.  A burst of segmental duplications in the genome of the African great ape ancestor , 2009, Nature.

[54]  Derek Y. Chiang,et al.  High-resolution mapping of copy-number alterations with massively parallel sequencing , 2009, Nature Methods.

[55]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .