Supplemental Material to : ” XCAVATOR : accurate detection and genotyping of copy number variants from second and third generation whole-genome sequencing experiments . ”

BackgroundWe developed a novel software package, XCAVATOR, for the identification of genomic regions involved in copy number variants/alterations (CNVs/CNAs) from short and long reads whole-genome sequencing experiments.ResultsBy using simulated and real datasets we showed that our tool, based on read count approach, is capable to predict the boundaries and the absolute number of DNA copies CNVs/CNAs with high resolutions. To demonstrate the power of our software we applied it to the analysis Illumina and Pacific Bioscencies data and we compared its performance to other ten state of the art tools.ConclusionAll the analyses we performed demonstrate that XCAVATOR is capable to detect germline and somatic CNVs/CNAs outperforming all the other tools we compared. XCAVATOR is freely available at http://sourceforge.net/projects/xcavator/.

[1]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[2]  F. Blattner,et al.  Mauve: multiple alignment of conserved genomic sequence with rearrangements. , 2004, Genome research.

[3]  Yi Xing,et al.  Negative selection pressure against premature protein truncation is reduced by both alternative splicing and diploidy , 2004, Genome Biology.

[4]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[5]  E. Eichler,et al.  Fine-scale structural variation of the human genome , 2005, Nature Genetics.

[6]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[7]  Joshua M. Korn,et al.  Integrated detection and population-genetic analysis of SNPs and copy number variation , 2008, Nature Genetics.

[8]  Chao Xie,et al.  CNV-seq, a new method to detect copy number variation using high-throughput sequencing , 2009, BMC Bioinformatics.

[9]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[10]  M. Gerstein,et al.  PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data , 2009, Genome Biology.

[11]  Kenny Q. Ye,et al.  Sensitive and accurate detection of copy number variants using read depth of coverage. , 2009, Genome research.

[12]  M. Hurles,et al.  Copy number variation in human health, disease, and evolution. , 2009, Annual review of genomics and human genetics.

[13]  H. Bayley,et al.  Continuous base identification for single-molecule nanopore DNA sequencing. , 2009, Nature nanotechnology.

[14]  S. Turner,et al.  Real-Time DNA Sequencing from Single Polymerase Molecules , 2009, Science.

[15]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[16]  E. Eichler,et al.  Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. , 2009, Genome research.

[17]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[18]  Derek Y. Chiang,et al.  The landscape of somatic copy-number alteration across human cancers , 2010, Nature.

[19]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[20]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[21]  Mark Gerstein,et al.  Personal genome sequencing: current approaches and challenges. , 2010, Genes & development.

[22]  Matteo Benelli,et al.  A shifting level model algorithm that identifies aberrations in array-CGH data. , 2010, Biostatistics.

[23]  Matteo Benelli,et al.  A very fast and accurate method for calling aberrations in array-CGH data. , 2010, Biostatistics.

[24]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[25]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[26]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[27]  Seungtai Yoon,et al.  Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithm , 2011, Nucleic acids research.

[28]  Joshua M. Korn,et al.  Discovery and genotyping of genome structural polymorphism by sequencing on a population scale , 2011, Nature Genetics.

[29]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.

[30]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[31]  Mark D. Johnson,et al.  Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion , 2011, Proceedings of the National Academy of Sciences.

[32]  Steven Salzberg,et al.  Mugsy: fast multiple alignment of closely related whole genomes , 2010, Bioinform..

[33]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[34]  Jannik N. Andersen,et al.  Cancer genomics: from discovery science to personalized medicine , 2011, Nature Medicine.

[35]  Alberto Magi,et al.  Read count approach for DNA copy number variants detection , 2012, Bioinform..

[36]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[37]  Jin Zhang,et al.  An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data , 2012, BMC Bioinformatics.

[38]  Tatiana Popova,et al.  Supplementary Methods , 2012, Acta Neuropsychiatrica.

[39]  David G. Knowles,et al.  Fast Computation and Applications of Genome Mappability , 2012, PloS one.

[40]  Henry M. Wood,et al.  Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next-generation sequence data , 2012, Bioinform..

[41]  B. Giusti,et al.  EXCAVATOR: detecting copy number variants from whole-exome sequencing data , 2013, Genome Biology.

[42]  S. Salzberg,et al.  Open access to tree genomes: the path to a better forest , 2013, Genome Biology.

[43]  Chris Sander,et al.  Emerging landscape of oncogenic signatures across human cancers , 2013, Nature Genetics.

[44]  Benjamin J. Raphael,et al.  THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data , 2013, Genome Biology.

[45]  Justin M. Zook Extensive sequencing of seven human genomes to characterize benchmark reference materials , 2015 .

[46]  G. McVean,et al.  A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree , 2016, bioRxiv.

[47]  Alberto Magi,et al.  Characterization of MinION nanopore data for resequencing analyses , 2016, Briefings Bioinform..