A Linear-Time Algorithm for the Copy Number Transformation Problem

Problems of genome rearrangement are central in both evolution and cancer. Most evolutionary scenarios have been studied under the assumption that the genome contains a single copy of each gene. In contrast, tumor genomes undergo deletions and duplications, and thus, the number of copies of genes varies. The number of copies of each segment along a chromosome is called its copy number profile (CNP). Understanding CNP changes can assist in predicting disease progression and treatment. To date, questions related to distances between CNPs gained little scientific attention. Here we focus on the following fundamental problem, introduced by Schwarz et al.: given two CNPs, u and v, compute the minimum number of operations transforming u into v, where the edit operations are segmental deletions and amplifications. We establish the computational complexity of this problem, showing that it is solvable in linear time and constant space.

[1]  Denis Bertrand,et al.  Genome Halving and Double Distance with Losses , 2011, J. Comput. Biol..

[2]  Russell Schwartz,et al.  Inferring models of multiscale copy number evolution for single-tumor phylogenetics , 2015, Bioinform..

[3]  Guillaume Fertin,et al.  Combinatorics of Genome Rearrangements , 2009, Computational molecular biology.

[4]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[5]  J W Gray,et al.  Cytogenetic analysis using quantitative, high-sensitivity, fluorescence hybridization. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Benjamin J. Raphael,et al.  Reconstructing cancer genomes from paired-end sequencing data , 2012, BMC Bioinformatics.

[7]  Mehryar Mohri Weighted Finite-State Transducer Algorithms. An Overview , 2004 .

[8]  James D. Brenton,et al.  Phylogenetic Quantification of Intra-tumour Heterogeneity , 2013, PLoS Comput. Biol..

[9]  Alexander Eckehart Urban,et al.  High-resolution mapping of DNA copy alterations in human chromosome 22 using high-density tiling oligonucleotide arrays. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[10]  W. Hahn,et al.  BreaKmer: detection of structural variation in targeted massively parallel sequencing data using kmers , 2014, Nucleic acids research.

[11]  E. Letouzé,et al.  Analysis of the copy number profiles of several tumor samples from the same patient reveals the successive steps in tumorigenesis , 2010, Genome Biology.

[12]  Russell Schwartz,et al.  Phylogenetic analysis of multiprobe fluorescence in situ hybridization data from tumor cell populations , 2013, Bioinform..

[13]  Yu Lin,et al.  Approximating the edit distance for genomes with duplicate genes under DCJ, insertion and deletion , 2012, BMC Bioinformatics.

[14]  Mehryar Mohri Edit-Distance Of Weighted Automata: General Definitions And Algorithms , 2003, Int. J. Found. Comput. Sci..

[15]  Russell Schwartz,et al.  Algorithms to Model Single Gene, Single Chromosome, and Whole Genome Copy Number Changes Jointly in Tumor Phylogenetics , 2014, PLoS Comput. Biol..

[16]  S. C. Sahinalp,et al.  nFuse: Discovery of complex genomic rearrangements in cancer using high-throughput sequencing , 2012, Genome research.

[17]  David Sankoff,et al.  Multichromosomal median and halving problems under different genomic distances , 2009, BMC Bioinformatics.