DBS: a fast and informative segmentation algorithm for DNA copy number analysis

BackgroundGenome-wide DNA copy number changes are the hallmark events in the initiation and progression of cancers. Quantitative analysis of somatic copy number alterations (CNAs) has broad applications in cancer research. With the increasing capacity of high-throughput sequencing technologies, fast and efficient segmentation algorithms are required when characterizing high density CNAs data.ResultsA fast and informative segmentation algorithm, DBS (Deviation Binary Segmentation), is developed and discussed. The DBS method is based on the least absolute error principles and is inspired by the segmentation method rooted in the circular binary segmentation procedure. DBS uses point-by-point model calculation to ensure the accuracy of segmentation and combines a binary search algorithm with heuristics derived from the Central Limit Theorem. The DBS algorithm is very efficient requiring a computational complexity of O(n*log n), and is faster than its predecessors. Moreover, DBS measures the change-point amplitude of mean values of two adjacent segments at a breakpoint, where the significant degree of change-point amplitude is determined by the weighted average deviation at breakpoints. Accordingly, using the constructed binary tree of significant degree, DBS informs whether the results of segmentation are over- or under-segmented.ConclusionDBS is implemented in a platform-independent and open-source Java application (ToolSeg), including a graphical user interface and simulation data generation, as well as various segmentation methods in the native Java language.

[1]  Emmanuel Barillot,et al.  Analysis of array CGH data: from signal ratio to gain and loss of DNA regions , 2004, Bioinform..

[2]  Jianfeng Xu,et al.  BACOM: in silico detection of genomic deletion types and correction of normal cell contamination in copy number data , 2011, Bioinform..

[3]  E. S. Venkatraman,et al.  A faster circular binary segmentation algorithm for the analysis of array CGH data , 2007, Bioinform..

[4]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[5]  Fred A. Wright,et al.  Integrated study of copy number states and genotype calls using high-density SNP arrays , 2009, Nucleic acids research.

[6]  S. Swamy,et al.  PICNIC: an algorithm to predict absolute allelic copy number variation with microarray cancer data , 2009, Biostatistics.

[7]  A. Børresen-Dale,et al.  Copynumber: Efficient algorithms for single- and multi-track copy number segmentation , 2012, BMC Genomics.

[8]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[9]  R. H. Kent,et al.  The Mean Square Successive Difference , 1941 .

[10]  I. Shih,et al.  BACOM2.0 facilitates absolute normalization and quantification of somatic copy number alterations in heterogeneous tumor , 2013, Scientific Reports.

[11]  Derek Y. Chiang,et al.  The landscape of somatic copy-number alteration across human cancers , 2010, Nature.

[12]  Christian A. Rees,et al.  Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Guillem Rigaill,et al.  Pruned dynamic programming for optimal multiple change-point detection , 2010 .

[14]  Guillem Rigaill,et al.  Performance evaluation of DNA copy number segmentation methods , 2014, Briefings Bioinform..

[15]  Hao Chen,et al.  Estimation of Parent Specific DNA Copy Number in Tumors using High-Density Genotyping Arrays , 2011, PLoS Comput. Biol..

[16]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[17]  Zaïd Harchaoui,et al.  Catching Change-points with Lasso , 2007, NIPS.

[18]  Z. Harchaoui,et al.  Multiple Change-Point Estimation With a Total Variation Penalty , 2010 .

[19]  Thomas S. Huang,et al.  A fast two-dimensional median filtering algorithm , 1979 .

[20]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .