SM-RCNV: a statistical method to detect recurrent copy number variations in sequenced samples

BackgroundCopy number variation (CNV) is an important form of genomic structural variation and is linked to dozens of human diseases. Using next-generation sequencing (NGS) data and developing computational methods to characterize such structural variants is significant for understanding the mechanisms of diseases.ObjectiveThe objective of this study is to develop a new statistical method of detection recurrent CNVs across multiple samples from genomic sequences.MethodsA statistical method is carried out to detect recurrent CNVs, referred to as SM-RCNV. This method uses a statistic associated with each location by combining the frequency of variation at one location across whole samples and the correlation among consecutive locations. The weights of the frequency and correlation are trained using real datasets with known CNVs. P-value is assessed for each location on the genome by permutation testing.ResultsCompared with six peer methods, SM-RCNV outperforms the peer methods under receiver operating characteristic curves. SM-RCNV successfully identifies many consistent recurrent CNVs, most of which are known to be of biological significance and associated with diseased genes. The validation rate of SM-RCNV in the CEU call set and YRI call set with Database of Genomic Variants are 258/328 (79%) and (157/309) 51%, respectively.ConclusionSM-RCNV is a well-grounded statistical framework for detecting recurrent CNVs from multiple genomic sequences, providing valuable information to study genomes in human diseases. The source code is freely available at https://sourceforge.net/projects/sm-rcnv/.

[1]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[2]  Mark D. Johnson,et al.  Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion , 2011, Proceedings of the National Academy of Sciences.

[3]  Sean D. Smith,et al.  GROM-RD: resolving genomic biases to improve read depth detection of copy number variants , 2015, PeerJ.

[4]  J. R. MacDonald,et al.  A Comprehensive Workflow for Read Depth-Based Identification of Copy-Number Variation from Whole-Genome Sequence Data. , 2018, American journal of human genetics.

[5]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[6]  R. Redon,et al.  Copy Number Variation: New Insights in Genome Diversity References , 2006 .

[7]  L. Johnson,et al.  Abstract 1381: NGS-based CNV detection sensitivity is dependent upon nucleic acid input quality , 2016 .

[8]  Kenny Q. Ye,et al.  Sensitive and accurate detection of copy number variants using read depth of coverage. , 2009, Genome research.

[9]  Ao Li,et al.  Discovering Recurrent Copy Number Aberrations in Complex Patterns via Non-Negative Sparse Singular Value Decomposition , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Tatiana Popova,et al.  Supplementary Methods , 2012, Acta Neuropsychiatrica.

[11]  Michael R. Speicher,et al.  A survey of tools for variant analysis of next-generation genome sequencing data , 2013, Briefings Bioinform..

[12]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[13]  X. Xie,et al.  Reproducible copy number variation patterns among single circulating tumor cells of lung cancer patients , 2013, Proceedings of the National Academy of Sciences.

[14]  J. Ahringer,et al.  Systematic bias in high-throughput sequencing data and its correction by BEADS , 2011, Nucleic acids research.

[15]  S. Hochreiter,et al.  cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate , 2012, Nucleic acids research.

[16]  Christopher A. Miller,et al.  ReadDepth: A Parallel R Package for Detecting Copy Number Alterations from Short Sequencing Reads , 2011, PloS one.

[17]  W. Miller,et al.  Comparison of Sequencing Platforms for Single Nucleotide Variant Calls in a Human Sample , 2013, PloS one.

[18]  Kenny Q. Ye,et al.  Strong Association of De Novo Copy Number Mutations with Autism , 2007, Science.

[19]  Yu-Ping Wang,et al.  Common Copy Number Variation Detection From Multiple Sequenced Samples , 2014, IEEE Transactions on Biomedical Engineering.

[20]  Michael A. Black,et al.  The CNVrd2 package: measurement of copy number at complex loci using high-throughput sequencing data , 2014, Front. Genet..

[21]  Yu-Ping Wang,et al.  CNV-TV: A robust method to discover copy number variation from short sequencing reads , 2013, BMC Bioinformatics.

[22]  Chao Xie,et al.  CNV-seq, a new method to detect copy number variation using high-throughput sequencing , 2009, BMC Bioinformatics.

[23]  Liying Yang,et al.  Detection of Significant Copy Number Variations From Multiple Samples in Next-Generation Sequencing Data , 2018, IEEE Transactions on NanoBioscience.

[24]  Liying Yang,et al.  IntSIM: An Integrated Simulator of Next-Generation Sequencing Data , 2017, IEEE Transactions on Biomedical Engineering.

[25]  Bi Zhou,et al.  Gene copy-number variation and associated polymorphisms of complement component C4 in human systemic lupus erythematosus (SLE): low copy number is a risk factor for and high copy number is a protective factor against SLE susceptibility in European Americans. , 2007, American journal of human genetics.

[26]  Peter M. Rice,et al.  The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants , 2009, Nucleic acids research.

[27]  F. Pasquier,et al.  Alzheimer risk associated with a copy number variation in the complement receptor 1 increasing C3b/C4b binding sites , 2011, Molecular Psychiatry.

[28]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[29]  Y. Benjamini,et al.  Summarizing and correcting the GC content bias in high-throughput sequencing , 2012, Nucleic acids research.

[30]  J. Kitzman,et al.  Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing , 2009, Nature Genetics.

[31]  Lars Feuk,et al.  The Database of Genomic Variants: a curated collection of structural variation in the human genome , 2013, Nucleic Acids Res..