SEG - A Software Program for Finding Somatic Copy Number Alterations in Whole Genome Sequencing Data of Cancer

As next-generation sequencing technology advances and the cost decreases, whole genome sequencing (WGS) has become the preferred platform for the identification of somatic copy number alteration (CNA) events in cancer genomes. To more effectively decipher these massive sequencing data, we developed a software program named SEG, shortened from the word “segment”. SEG utilizes mapped read or fragment density for CNA discovery. To reduce CNA artifacts arisen from sequencing and mapping biases, SEG first normalizes the data by taking the log2-ratio of each tumor density against its matching normal density. SEG then uses dynamic programming to find change-points among a contiguous log2-ratio data series along a chromosome, dividing the chromosome into different segments. SEG finally identifies those segments having CNA. Our analyses with both simulated and real sequencing data indicate that SEG finds more small CNAs than other published software tools.

[1]  Shaying Zhao,et al.  Canine Spontaneous Head and Neck Squamous Cell Carcinomas Represent Their Human Counterparts at the Molecular Level , 2015, PLoS genetics.

[2]  Shaying Zhao,et al.  Molecular homology and difference between spontaneous canine mammary cancer and human breast cancer. , 2014, Cancer research.

[3]  Shaying Zhao,et al.  Cancer driver candidate genes AVL9, DENND5A and NUPL1 contribute to MDCK cystogenesis , 2014, Oncoscience.

[4]  T. Yeatman,et al.  Copy number abnormalities in sporadic canine colorectal cancers. , 2010, Genome research.

[5]  Ebrahim Afyounian,et al.  Segmentum: a tool for copy number analysis of cancer genomes , 2017, BMC Bioinformatics.

[6]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[7]  James A. Cuff,et al.  Genome sequence, comparative analysis and haplotype structure of the domestic dog , 2005, Nature.

[8]  Shaying Zhao,et al.  Cancer driver-passenger distinction via sporadic human and dog cancer comparison: a proof of principle study with colorectal cancer , 2013, Oncogene.

[9]  Guoqing Wang,et al.  Comprehensive characterization of the genomic alterations in human gastric cancer , 2015, International journal of cancer.

[10]  Mark D. Johnson,et al.  Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion , 2011, Proceedings of the National Academy of Sciences.

[11]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[12]  Tatiana Popova,et al.  Supplementary Methods , 2012, Acta Neuropsychiatrica.

[13]  Aman N. Patel,et al.  CONSERTING: integrating copy-number analysis with structural-variation detection , 2015, Nature Methods.

[14]  Eric J Duncavage,et al.  Detection of structural DNA variation from next generation sequencing data: a review of informatic approaches. , 2013, Cancer genetics.

[15]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[16]  T. Richmond,et al.  Methods in high-resolution, array-based comparative genomic hybridization. , 2007, Methods in molecular biology.

[17]  Derek Y. Chiang,et al.  The landscape of somatic copy-number alteration across human cancers , 2010, Nature.

[18]  A. Børresen-Dale,et al.  COMPLEX LANDSCAPES OF SOMATIC REARRANGEMENT IN HUMAN BREAST CANCER GENOMES , 2009, Nature.

[19]  Sampsa Hautaniemi,et al.  Comparative analysis of methods for identifying somatic copy number alterations from deep sequencing data , 2015, Briefings Bioinform..

[20]  A. Kallioniemi CGH microarrays and cancer. , 2008, Current opinion in biotechnology.

[21]  S. Gabriel,et al.  Pan-cancer patterns of somatic copy-number alteration , 2013, Nature Genetics.

[22]  Aleix Prat Aparicio Comprehensive molecular portraits of human breast tumours , 2012 .

[23]  N. Navin,et al.  Clonal Evolution in Breast Cancer Revealed by Single Nucleus Genome Sequencing , 2014, Nature.

[24]  Kai Wang,et al.  PennCNV in whole-genome sequencing data , 2017, BMC Bioinformatics.

[25]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of human colon and rectal cancer , 2012, Nature.

[26]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[27]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[28]  J. Troge,et al.  Tumour evolution inferred by single-cell sequencing , 2011, Nature.

[29]  Benjamin J. Raphael,et al.  Expanding the computational toolbox for mining cancer genomes , 2014, Nature Reviews Genetics.