Novel Definition and Algorithm for Chaining Fragments with Proportional Overlaps

Chaining fragments is a crucial step in genome alignment. Existing chaining algorithms compute a maximum weighted chain with no overlaps allowed between adjacent fragments. In practice, using local alignments as fragments, instead of Maximal Exact Matches (MEMs), generates frequent overlaps between fragments, due to combinatorial reasons and biological factors, i.e., variable tandem repeat structures that differ in number of copies between genomic sequences. In this article, in order to raise this limitation, we formulate a novel definition of a chain, allowing overlaps proportional to the fragments lengths, and exhibit an efficient algorithm for computing such a maximum weighted chain. We tested our algorithm on a dataset composed of 694 genome pairs and accounted for significant improvements in terms of coverage, while keeping the running times below reasonable limits. Moreover, experiments with different ratios of allowed overlaps showed the robustness of the chains with respect to these ratios. Our algorithm is implemented in a tool called OverlapChainer (OC), which is available upon request to the authors.

[1]  Gregory Kucherov,et al.  YASS: enhancing the sensitivity of DNA similarity search , 2005, Nucleic Acids Res..

[2]  B. Boussau,et al.  Genomes as documents of evolutionary history. , 2010, Trends in ecology & evolution.

[3]  Eugene W. Myers,et al.  Chaining multiple-alignment fragments in sub-quadratic time , 1995, SODA '95.

[4]  Enno Ohlebusch,et al.  Chaining algorithms for multiple genome comparison , 2005, J. Discrete Algorithms.

[5]  Tetsuo Shibuya,et al.  Match Chaining Algorithms for cDNA Mapping , 2003, WABI.

[6]  Xiaohua Hu,et al.  Average gene length is highly conserved in prokaryotes and eukaryotes and diverges only between the two kingdoms. , 2006, Molecular biology and evolution.

[7]  Ingmar Reuter,et al.  Integr8 and Genome Reviews: integrated views of complete genomes and proteomes , 2004, Nucleic Acids Res..

[8]  Stefan Felsner,et al.  Trapezoid Graphs and Generalizations, Geometry and Algorithms , 1994, Discret. Appl. Math..

[9]  Eric Rivals,et al.  Improved Sensitivity And Reliability Of Anchor Based Genome Alignment , 2009 .

[10]  Marie-France Sagot,et al.  A small trip in the untranquil world of genomes: A survey on the detection and analysis of genome rearrangement breakpoints , 2008, Theor. Comput. Sci..

[11]  Eric Rivals,et al.  Comparison of minisatellites , 2002, RECOMB '02.

[12]  Enno Ohlebusch,et al.  Efficient multiple genome alignment , 2002, ISMB.

[13]  Marie-Adèle Rajandream,et al.  Comparative genomics of the fungal pathogens Candida dubliniensis and Candida albicans. , 2009, Genome research.

[14]  Rino Rappuoli,et al.  Post‐genomic vaccine development , 2006, FEBS letters.

[15]  M. V. Wilkes,et al.  The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .