Identifying Human Genome-Wide CNV, LOH and UPD by Targeted Sequencing of Selected Regions

Copy-number variations (CNV), loss of heterozygosity (LOH), and uniparental disomy (UPD) are large genomic aberrations leading to many common inherited diseases, cancers, and other complex diseases. An integrated tool to identify these aberrations is essential in understanding diseases and in designing clinical interventions. Previous discovery methods based on whole-genome sequencing (WGS) require very high depth of coverage on the whole genome scale, and are cost-wise inefficient. Another approach, whole exome genome sequencing (WEGS), is limited to discovering variations within exons. Thus, we are lacking efficient methods to detect genomic aberrations on the whole genome scale using next-generation sequencing technology. Here we present a method to identify genome-wide CNV, LOH and UPD for the human genome via selectively sequencing a small portion of genome termed Selected Target Regions (SeTRs). In our experiments, the SeTRs are covered by 99.73%~99.95% with sufficient depth. Our developed bioinformatics pipeline calls genome-wide CNVs with high confidence, revealing 8 credible events of LOH and 3 UPD events larger than 5M from 15 individual samples. We demonstrate that genome-wide CNV, LOH and UPD can be detected using a cost-effective SeTRs sequencing approach, and that LOH and UPD can be identified using just a sample grouping technique, without using a matched sample or familial information.

[1]  张静,et al.  Banana Ovate family protein MaOFP1 and MADS-box protein MuMADS1 antagonistically regulated banana fruit ripening , 2015 .

[2]  Xiuqing Zhang,et al.  PSCC: Sensitive and Reliable Population-Scale Copy Number Variation Detection Method Based on Low Coverage Sequencing , 2014, PloS one.

[3]  B. Giusti,et al.  EXCAVATOR: detecting copy number variants from whole-exome sequencing data , 2013, Genome Biology.

[4]  Xun Xu,et al.  A Single Cell Level Based Method for Copy Number Variation Analysis by Low Coverage Massively Parallel Sequencing , 2013, PloS one.

[5]  E. Cuppen,et al.  Systematic biases in DNA copy number originate from isolation procedures , 2013, Genome Biology.

[6]  Agus Salim,et al.  Statistical challenges associated with detecting copy number variations with next-generation sequencing , 2012, Bioinform..

[7]  Xin Jin,et al.  An exome sequencing pipeline for identifying and genotyping common CNVs associated with disease with application to psoriasis , 2012, Bioinform..

[8]  Jason Li,et al.  CONTRA: copy number analysis for targeted resequencing , 2012, Bioinform..

[9]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[10]  G. McVean,et al.  De novo assembly and genotyping of variants using colored de Bruijn graphs , 2011, Nature Genetics.

[11]  Santhosh Girirajan,et al.  Human copy number variation and complex genetic disease. , 2011, Annual review of genetics.

[12]  John Quackenbush,et al.  Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV , 2011, Bioinform..

[13]  Matthew Ruffalo,et al.  Comparative analysis of algorithms for next-generation sequencing read alignment , 2011, Bioinform..

[14]  M. Spector,et al.  A comparative analysis of exome capture , 2011, Genome Biology.

[15]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[16]  Richard Redon,et al.  aCGH.Spline - an R package for aCGH dye bias normalization , 2011, Bioinform..

[17]  Christopher A. Miller,et al.  ReadDepth: A Parallel R Package for Detecting Copy Number Alterations from Short Sequencing Reads , 2011, PloS one.

[18]  T. Fennell,et al.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries , 2011, Genome Biology.

[19]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[20]  Leslie G Biesecker,et al.  Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. , 2010, American journal of human genetics.

[21]  Michael Brudno,et al.  Genome Variation Discovery with High-throughput Sequencing Data , 2022 .

[22]  Peter M. Rice,et al.  The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants , 2009, Nucleic acids research.

[23]  P. Shannon,et al.  Exome sequencing identifies the cause of a Mendelian disorder , 2009, Nature Genetics.

[24]  Marcus Hutter,et al.  An integrated Bayesian analysis of LOH and copy number data , 2010, BMC Bioinformatics.

[25]  F. Deng,et al.  Genome-wide copy number variation association study suggested VPS13B gene for osteoporosis in Caucasians , 2010, Osteoporosis International.

[26]  Paul Medvedev,et al.  Computational methods for discovering structural variation with next-generation sequencing , 2009, Nature Methods.

[27]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[28]  Kenny Q. Ye,et al.  Sensitive and accurate detection of copy number variants using read depth of coverage. , 2009, Genome research.

[29]  Jungwon Huh,et al.  Loss of heterozygosity 4q24 and TET2 mutations associated with myelodysplastic/myeloproliferative neoplasms. , 2009, Blood.

[30]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[31]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[32]  Derek Y. Chiang,et al.  High-resolution mapping of copy-number alterations with massively parallel sequencing , 2009, Nature Methods.

[33]  Thomas W. Mühleisen,et al.  Large recurrent microdeletions associated with schizophrenia , 2008, Nature.

[34]  Yuan Jiang,et al.  A probe-density-based analysis method for array CGH data: simulation, normalization and centralization , 2008, Bioinform..

[35]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[36]  Terence P. Speed,et al.  Estimation and assessment of raw copy numbers at the single locus level , 2008, Bioinform..

[37]  Tomas W. Fitzgerald,et al.  Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization , 2007, Genome Biology.

[38]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[39]  Woo Young Kim,et al.  Hypermethylation and loss of heterozygosity of tumor suppressor genes on chromosome 3p in cervical cancer. , 2007, Cancer letters.

[40]  David W Mount,et al.  Using the Basic Local Alignment Search Tool (BLAST). , 2007, CSH protocols.

[41]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[42]  R. Redon,et al.  Copy Number Variation: New Insights in Genome Diversity References , 2006 .

[43]  D. Campion,et al.  APP locus duplication causes autosomal dominant early-onset Alzheimer disease with cerebral amyloid angiopathy , 2006, Nature Genetics.

[44]  D. Conrad,et al.  A high-resolution survey of deletion polymorphism in the human genome , 2006, Nature Genetics.

[45]  E. Eichler,et al.  Fine-scale structural variation of the human genome , 2005, Nature Genetics.

[46]  J. Weber,et al.  Long homozygous chromosomal segments in reference families from the centre d'Etude du polymorphisme humain. , 1999, American journal of human genetics.

[47]  W. Kuo,et al.  High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays , 1998, Nature Genetics.