Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology

BackgroundStructural variants (SVs) are less common than single nucleotide polymorphisms and indels in the population, but collectively account for a significant fraction of genetic polymorphism and diseases. Base pair differences arising from SVs are on a much higher order (>100 fold) than point mutations; however, none of the current detection methods are comprehensive, and currently available methodologies are incapable of providing sufficient resolution and unambiguous information across complex regions in the human genome. To address these challenges, we applied a high-throughput, cost-effective genome mapping technology to comprehensively discover genome-wide SVs and characterize complex regions of the YH genome using long single molecules (>150 kb) in a global fashion.ResultsUtilizing nanochannel-based genome mapping technology, we obtained 708 insertions/deletions and 17 inversions larger than 1 kb. Excluding the 59 SVs (54 insertions/deletions, 5 inversions) that overlap with N-base gaps in the reference assembly hg19, 666 non-gap SVs remained, and 396 of them (60%) were verified by paired-end data from whole-genome sequencing-based re-sequencing or de novo assembly sequence from fosmid data. Of the remaining 270 SVs, 260 are insertions and 213 overlap known SVs in the Database of Genomic Variants. Overall, 609 out of 666 (90%) variants were supported by experimental orthogonal methods or historical evidence in public databases. At the same time, genome mapping also provides valuable information for complex regions with haplotypes in a straightforward fashion. In addition, with long single-molecule labeling patterns, exogenous viral sequences were mapped on a whole-genome scale, and sample heterogeneity was analyzed at a new level.ConclusionOur study highlights genome mapping technology as a comprehensive and cost-effective method for detecting structural variation and studying complex regions in the human genome, as well as deciphering viral integration into the host genome.

[1]  Peter Parham,et al.  KIR: diverse, rapidly evolving receptors of innate and adaptive immunity. , 2002, Annual review of immunology.

[2]  Peter Li,et al.  GigaDB: promoting data dissemination and reproducibility , 2014, Database J. Biol. Databases Curation.

[3]  V. Diehl,et al.  Analysis of immunoglobulin, T cell receptor and bcr rearrangements in human malignant lymphoma and Hodgkin's disease. , 1990, Oncology.

[4]  Gillian E. Wu,et al.  Gene discovery at the human T-cell receptor alpha/delta locus. , 2007, Immunogenetics.

[5]  Jian Wang,et al.  De novo assembly of a haplotype-resolved human genome , 2015, Nature Biotechnology.

[6]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[7]  Xun Xu,et al.  HIVID: an efficient method to detect HBV integration using low coverage sequencing. , 2013, Genomics.

[8]  T. Anantharaman,et al.  A probabilistic analysis of false positives in optical map alignment and validation , 2001 .

[9]  J. Katzmann,et al.  Serum Reference Intervals and Diagnostic Ranges for Free κ and Free λ Immunoglobulin Light Chains: Relative Sensitivity for Detection of Monoclonal Light Chains , 2002 .

[10]  V. McKusick Mendelian Inheritance in Man and Its Online Version, OMIM , 2007, The American Journal of Human Genetics.

[11]  Michal Levy-Sakin,et al.  Beyond sequencing: optical mapping of DNA in the age of nanotechnology and nanoscopy. , 2013, Current opinion in biotechnology.

[12]  H. Guzmán,et al.  Identification of Novel Viruses Using VirusHunter -- an Automated Data Analysis Pipeline , 2013, PloS one.

[13]  Gillian E. Wu,et al.  Gene discovery at the human T-cell receptor α/δ locus , 2007, Immunogenetics.

[14]  G. Morgan,et al.  Immunoglobulin gene rearrangements and the pathogenesis of multiple myeloma. , 2007, Blood.

[15]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[16]  David C. Schwartz,et al.  An algorithm for assembly of ordered restriction maps from single DNA molecules , 2006, Proceedings of the National Academy of Sciences.

[17]  E. Eichler,et al.  Segmental duplications and copy-number variation in the human genome. , 2005, American journal of human genetics.

[18]  Yong-shu He,et al.  [Structural variation in the human genome]. , 2009, Yi chuan = Hereditas.

[19]  T. Lion,et al.  Visualization of episomal and integrated Epstein‐Barr virus DNA by fiber fluorescence in situ hybridization , 2006, International journal of cancer.

[20]  Pui-Yan Kwok,et al.  Rapid Genome Mapping in Nanochannel Arrays for Highly Complete and Accurate De Novo Sequence Assembly of the Complex Aegilops tauschii Genome , 2013, PloS one.

[21]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[22]  Peter Li,et al.  GigaDB: announcing the GigaScience database , 2012, GigaScience.

[23]  Deacon J. Sweeney,et al.  Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus) , 2012, Nature Biotechnology.

[24]  E. Eichler,et al.  Limitations of next-generation genome sequence assembly , 2011, Nature Methods.

[25]  S. Bhatt,et al.  Microdeletion and Microduplication Syndromes , 2012, The journal of histochemistry and cytochemistry : official journal of the Histochemistry Society.

[26]  Richard M Myers,et al.  Population analysis of large copy number variants and hotspots of human genetic disease. , 2009, American journal of human genetics.

[27]  D. Ledbetter,et al.  Deletions of chromosome 15 as a cause of the Prader-Willi syndrome. , 1981, The New England journal of medicine.

[28]  P. Kwok,et al.  Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly , 2012, Nature Biotechnology.

[29]  Kenny Q. Ye,et al.  Large-Scale Copy Number Polymorphism in the Human Genome , 2004, Science.

[30]  Elena S. Babaylova,et al.  Complete sequence and gene map of a human major histocompatibility complex , 1999, Nature.

[31]  Han Cao,et al.  Single molecule linear analysis of DNA in nano-channel labeled with sequence specific fluorescent probes , 2010, Nucleic acids research.

[32]  Jan O. Korbel,et al.  Phenotypic impact of genomic structural variation: insights from and for human disease , 2013, Nature Reviews Genetics.

[33]  E. Eichler,et al.  A genome-wide comparison of recent chimpanzee and human segmental duplications , 2005, Nature.

[34]  James G. R. Gilbert,et al.  Variation analysis and gene annotation of eight MHC haplotypes: The MHC Haplotype Project , 2008, Immunogenetics.

[35]  Qingguo Wang,et al.  Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives , 2013, BMC Bioinformatics.

[36]  E. Eichler,et al.  Fine-scale structural variation of the human genome , 2005, Nature Genetics.

[37]  David C. Schwartz,et al.  High-resolution human genome structure by single-molecule analysis , 2010, Proceedings of the National Academy of Sciences.

[38]  Kenny Q. Ye,et al.  Strong Association of De Novo Copy Number Mutations with Autism , 2007, Science.

[39]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[40]  J. Katzmann,et al.  Serum reference intervals and diagnostic ranges for free kappa and free lambda immunoglobulin light chains: relative sensitivity for detection of monoclonal light chains. , 2002, Clinical chemistry.

[41]  Michael C. Rusch,et al.  CREST maps somatic structural variation in cancer genomes with base-pair resolution , 2011, Nature Methods.

[42]  Lin Li,et al.  Whole-genome sequencing identifies recurrent mutations in hepatocellular carcinoma , 2013, Genome research.

[43]  Joshua M. Korn,et al.  Integrated detection and population-genetic analysis of SNPs and copy number variation , 2008, Nature Genetics.

[44]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.

[45]  N. Carter,et al.  A Complete Map of the Human Immunoglobulin VH Locus , 1995, Annals of the New York Academy of Sciences.

[46]  Lars Feuk,et al.  The Database of Genomic Variants: a curated collection of structural variation in the human genome , 2013, Nucleic Acids Res..

[47]  Robert S. Harris,et al.  Improved pairwise alignment of genomic dna , 2007 .

[48]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[49]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[50]  Santhosh Girirajan,et al.  Human copy number variation and complex genetic disease. , 2011, Annual review of genetics.

[51]  R. Redon,et al.  Copy Number Variation: New Insights in Genome Diversity References , 2006 .

[52]  J. Lupski,et al.  Genomic rearrangements and sporadic disease , 2007, Nature Genetics.

[53]  P. Stankiewicz,et al.  Structural variation in the human genome and its role in disease. , 2010, Annual review of medicine.

[54]  Huanming Yang,et al.  Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly , 2011, Nature Biotechnology.

[55]  Sumati Rajagopalan,et al.  Understanding how combinations of HLA and KIR genes influence disease , 2005, The Journal of experimental medicine.

[56]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.