Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding.

We describe the genome sequencing of an anonymous individual of African origin using a novel ligation-based sequencing assay that enables a unique form of error correction that improves the raw accuracy of the aligned reads to >99.9%, allowing us to accurately call SNPs with as few as two reads per allele. We collected several billion mate-paired reads yielding approximately 18x haploid coverage of aligned sequence and close to 300x clone coverage. Over 98% of the reference genome is covered with at least one uniquely placed read, and 99.65% is spanned by at least one uniquely placed mate-paired clone. We identify over 3.8 million SNPs, 19% of which are novel. Mate-paired data are used to physically resolve haplotype phases of nearly two-thirds of the genotypes obtained and produce phased segments of up to 215 kb. We detect 226,529 intra-read indels, 5590 indels between mate-paired reads, 91 inversions, and four gene fusions. We use a novel approach for detecting indels between mate-paired reads that are smaller than the standard deviation of the insert size of the library and discover deletions in common with those detected with our intra-read approach. Dozens of mutations previously described in OMIM and hundreds of nonsynonymous single-nucleotide and structural variants in genes previously implicated in disease are identified in this individual. There is more genetic variation in the human genome still to be uncovered, and we provide guidance for future surveys in populations and cancer biopsies.

[1]  N. Morton,et al.  AN ESTIMATE OF THE MUTATIONAL DAMAGE IN MAN FROM DATA ON CONSANGUINEOUS MARRIAGES. , 1956, Proceedings of the National Academy of Sciences of the United States of America.

[2]  V. McKusick Mendelian inheritance in man , 1971 .

[3]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[4]  E. Lander,et al.  Genomic mapping by fingerprinting random clones: a mathematical analysis. , 1988, Genomics.

[5]  J. Rashbass Online Mendelian Inheritance in Man. , 1995, Trends in genetics : TIG.

[6]  J. Badge DNA sequencing. , 1998, Methods in molecular biology.

[7]  M Ronaghi,et al.  Analyses of secondary structures in DNA by pyrosequencing. , 1999, Analytical biochemistry.

[8]  Gabor T. Marth,et al.  A general approach to single-nucleotide polymorphism discovery , 1999, Nature Genetics.

[9]  D. Valle,et al.  Online Mendelian Inheritance In Man (OMIM) , 2000, Human mutation.

[10]  Rithy K. Roth,et al.  Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays , 2000, Nature Biotechnology.

[11]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[12]  David Valle,et al.  Human disease genes , 2001, Nature.

[13]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[14]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[15]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[16]  D. Dressman,et al.  Transforming single DNA molecules into fluorescent magnetic particles for detection and enumeration of genetic variations , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  S. Quake,et al.  Sequence information can be obtained from single DNA molecules , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[18]  P. Stenson,et al.  Human Gene Mutation Database (HGMD®): 2003 update , 2003, Human mutation.

[19]  P. Stenson,et al.  Human Gene Mutation Database (HGMD , 2003 .

[20]  M. Campbell,et al.  PANTHER: a library of protein families and subfamilies indexed by function. , 2003, Genome research.

[21]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[22]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[23]  Nicolas Peyret,et al.  The SNPlex genotyping system: a flexible and scalable platform for SNP genotyping. , 2005, Journal of biomolecular techniques : JBT.

[24]  J. Shendure,et al.  Materials and Methods Som Text Figs. S1 and S2 Tables S1 to S4 References Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome , 2022 .

[25]  E. Eichler,et al.  Fine-scale structural variation of the human genome , 2005, Nature Genetics.

[26]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[27]  Ryan E. Mills,et al.  An initial map of insertion and deletion (INDEL) variation in the human genome. , 2006, Genome research.

[28]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[29]  Nicholas J. Turro,et al.  Four-color DNA sequencing by synthesis using cleavable fluorescent nucleotide reversible terminators , 2006, Proceedings of the National Academy of Sciences.

[30]  G. Marth,et al.  Primer-site SNPs mask mutations , 2007, Nature Methods.

[31]  David N. Messina,et al.  Evolutionary and Biomedical Insights from the Rhesus Macaque Genome , 2007, Science.

[32]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[33]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[34]  E. Eichler,et al.  Population Stratification of a Common APOBEC Gene Deletion Polymorphism , 2007, PLoS genetics.

[35]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[36]  S. Mccarroll,et al.  Copy-number variation and association studies of human disease , 2007, Nature Genetics.

[37]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[38]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[39]  F. Hyland,et al.  Validation of the performance of a comprehensive genotyping assay panel of single nucleotide polymorphisms in drug metabolism enzyme genes , 2008, Human mutation.

[40]  S. Ranade,et al.  Stem cell transcriptome profiling via massive-scale mRNA sequencing , 2008, Nature Methods.

[41]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[42]  Jocelyn Kaiser,et al.  A Plan to Capture Human Diversity in 1000 Genomes , 2008, Science.

[43]  Steven M. Johnson,et al.  A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. , 2008, Genome research.

[44]  Antony V. Cox,et al.  Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing , 2008, Nature Genetics.

[45]  Gabor T. Marth,et al.  Whole-genome sequencing and variant discovery in C. elegans , 2008, Nature Methods.

[46]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[47]  Ryan D. Hernandez,et al.  Proportionally more deleterious genetic variation in European than in African populations , 2008, Nature.

[48]  M. Feldman,et al.  Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation , 2008 .

[49]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[50]  Judy H Cho,et al.  Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease , 2008, Nature Genetics.

[51]  Amy E. Hawkins,et al.  DNA sequencing of a cytogenetically normal acute myeloid leukemia genome , 2008, Nature.

[52]  Leonid Kruglyak,et al.  Rise of the Machines , 2008, PLoS genetics.

[53]  Seunghak Lee,et al.  A robust framework for detecting structural variations in a genome , 2008, ISMB.

[54]  Ali Bashir,et al.  Evaluation of Paired-End Sequencing Strategies for Detection of Genome Rearrangements in Cancer , 2008, PLoS Comput. Biol..

[55]  S. Gordon,et al.  Putative alternative trans‐splicing of leukocyte adhesion‐GPCR pre‐mRNAs generates functional chimeric receptors , 2008, FEBS letters.

[56]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.

[57]  S. Turner,et al.  Real-time DNA sequencing from single polymerase molecules. , 2010, Methods in enzymology.