Fast computation of a string duplication history under no-breakpoint-reuse

In this paper, we provide an O(n log2 n log log n log* n) algorithm to compute a duplication history of a string under no-breakpoint-reuse condition. The motivation of this problem stems from computational biology, in particular, from analysis of complex gene clusters. The problem is also related to computing edit distance with block operations, but, in our scenario, the start of the history is not fixed, but chosen to minimize the distance measure.

[1]  Louxin Zhang,et al.  CAGE: Combinatorial Analysis of Gene-Cluster Evolution , 2010, J. Comput. Biol..

[2]  Dana Shapira,et al.  Edit distance with move operations , 2002, J. Discrete Algorithms.

[3]  J. Nadeau,et al.  Lengths of chromosomal segments conserved since divergence of man and mouse. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Yu Zhang,et al.  Evolutionary History Reconstruction for Mammalian Complex Gene Clusters , 2009, J. Comput. Biol..

[5]  Graham Cormode,et al.  The string edit distance matching problem with moves , 2007, TALG.

[6]  Funda Ergün,et al.  Comparing Sequences with Segment Rearrangements , 2003, FSTTCS.

[7]  Olivier Gascuel,et al.  Duplication and Inversion History of a Tandemly Repeated Genes Family , 2007, J. Comput. Biol..

[8]  Krister M. Swenson,et al.  Evolution of Genome Organization by Duplication and Loss: An Alignment Approach , 2012, RECOMB.

[9]  Dana Shapira,et al.  Large Edit Distance with Multiple Block Operations , 2003, SPIRE.

[10]  O. Elemento,et al.  Reconstructing the duplication history of tandemly repeated genes. , 2002, Molecular biology and evolution.

[11]  Stephen Alstrup,et al.  Pattern matching in dynamic texts , 2000, SODA '00.

[12]  Hsing-Yen Ann,et al.  Efficient algorithms for the block edit problems , 2010, Inf. Comput..

[13]  Benjamin J. Raphael,et al.  Efficient algorithms for analyzing segmental duplications with deletions and inversions in genomes , 2010, Algorithms for Molecular Biology.

[14]  Kurt Mehlhorn,et al.  Dynamic fractional cascading , 1990, Algorithmica.

[15]  Daniel P. Lopresti,et al.  Block Edit Models for Approximate String Matching , 1997, Theor. Comput. Sci..

[16]  David Haussler,et al.  The infinite sites model of genome evolution , 2008, Proceedings of the National Academy of Sciences.

[17]  Peter Sanders,et al.  Linear work suffix array construction , 2006, JACM.

[18]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.

[19]  Adam C. Siepel,et al.  Reconstructing Histories of Complex Gene Clusters on a Phylogeny , 2009, J. Comput. Biol..