Integrative pipeline for profiling DNA copy number and inferring tumor phylogeny

Summary: Copy number variation is an important and abundant source of variation in the human genome, which has been associated with a number of diseases, especially cancer. Massively parallel next‐generation sequencing allows copy number profiling with fine resolution. Such efforts, however, have met with mixed successes, with setbacks arising partly from the lack of reliable analytical methods to meet the diverse and unique challenges arising from the myriad experimental designs and study goals in genetic studies. In cancer genomics, detection of somatic copy number changes and profiling of allele‐specific copy number (ASCN) are complicated by experimental biases and artifacts as well as normal cell contamination and cancer subclone admixture. Furthermore, careful statistical modeling is warranted to reconstruct tumor phylogeny by both somatic ASCN changes and single nucleotide variants. Here we describe a flexible computational pipeline, MARATHON, which integrates multiple related statistical software for copy number profiling and downstream analyses in disease genetic studies. Availability and implementation: MARATHON is publicly available at https://github.com/yuchaojiang/MARATHON. Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Nancy R. Zhang,et al.  Genetic and Genomic Characterization of 462 Melanoma Patient-Derived Xenografts, Tumor Biopsies, and Cell Lines. , 2017, Cell reports.

[2]  Z. Szallasi,et al.  Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data , 2014, Annals of oncology : official journal of the European Society for Medical Oncology.

[3]  K. Nathanson,et al.  ALLELE-SPECIFIC COPY NUMBER ESTIMATION BY WHOLE EXOME SEQUENCING. , 2017, The annals of applied statistics.

[4]  Li-San Wang,et al.  Integrative DNA copy number detection and genotyping from sequencing and array-based platforms , 2017, bioRxiv.

[5]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[6]  Nancy R. Zhang,et al.  CODEX2: full-spectrum copy number variation detection by high-throughput DNA sequencing , 2017, Genome Biology.

[7]  BRCA locus-specific loss of heterozygosity in germline BRCA1 and BRCA2 carriers , 2017, Nature Communications.

[8]  Gudrun Schleiermacher,et al.  Relapsed neuroblastomas show frequent RAS-MAPK pathway mutations , 2015, Nature Genetics.

[9]  Nancy R. Zhang,et al.  CODEX: a normalization and copy number variation detection method for whole exome sequencing , 2015, Nucleic acids research.

[10]  Nancy R. Zhang,et al.  Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing , 2016, Proceedings of the National Academy of Sciences.

[11]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[12]  N. Beerenwinkel,et al.  Advances in understanding tumour evolution through single-cell sequencing* , 2017, Biochimica et biophysica acta. Reviews on cancer.

[13]  Nancy R. Zhang,et al.  Allele-specific copy number profiling by next-generation DNA sequencing , 2014, Nucleic acids research.

[14]  Steven J. M. Jones,et al.  Genomic Classification of Cutaneous Melanoma , 2015, Cell.