Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization

Summary: We present a tool for control-free copy number alteration (CNA) detection using deep-sequencing data, particularly useful for cancer studies. The tool deals with two frequent problems in the analysis of cancer deep-sequencing data: absence of control sample and possible polyploidy of cancer cells. FREEC (control-FREE Copy number caller) automatically normalizes and segments copy number profiles (CNPs) and calls CNAs. If ploidy is known, FREEC assigns absolute copy number to each predicted CNA. To normalize raw CNPs, the user can provide a control dataset if available; otherwise GC content is used. We demonstrate that for Illumina single-end, mate-pair or paired-end sequencing, GC-contentr normalization provides smooth profiles that can be further segmented and analyzed in order to predict CNAs. Availability: Source code and sample data are available at http://bioinfo-out.curie.fr/projects/freec/. Contact: freec@curie.fr Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[2]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[3]  Kenny Q. Ye,et al.  Sensitive and accurate detection of copy number variants using read depth of coverage. , 2009, Genome research.

[4]  Juliane C. Dohm,et al.  Substantial biases in ultra-short read data sets from high-throughput DNA sequencing , 2008, Nucleic acids research.

[5]  Derek Y. Chiang,et al.  High-resolution mapping of copy-number alterations with massively parallel sequencing , 2009, Nature Methods.

[6]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[7]  J. Kitzman,et al.  Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing , 2009, Nature Genetics.

[8]  Tom Royce,et al.  A comprehensive catalogue of somatic mutations from a human cancer genome , 2010, Nature.

[9]  G. Mutter,et al.  PCR bias in amplification of androgen receptor alleles, a trinucleotide repeat marker used in clonality studies. , 1995, Nucleic acids research.

[10]  Chao Xie,et al.  CNV-seq, a new method to detect copy number variation using high-throughput sequencing , 2009, BMC Bioinformatics.

[11]  Antony V. Cox,et al.  Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing , 2008, Nature Genetics.

[12]  Zaïd Harchaoui,et al.  Catching Change-points with Lasso , 2007, NIPS.

[13]  S. Giovannoni,et al.  Bias caused by template annealing in the amplification of mixtures of 16S rRNA genes by PCR , 1996, Applied and environmental microbiology.