Novel Definition and Algorithm for Chaining Fragments with Proportional Overlaps

Chaining fragments is a crucial step in genome alignment. Existing chaining algorithms compute a maximum weighted chain with no overlaps allowed between adjacent fragments. In practice, using local alignments as fragments, instead of MEMs, generates frequent overlaps between fragments, due to combinatorial reasons and biological factors, i.e. variable tandem repeat structures that differ in number of copies between genomic sequences. In this paper, in order to raise this limitation, we formulate a novel definition of a chain, allowing overlaps proportional to the fragments lengths, and exhibit an efficient algorithm for computing such a maximum weighted chain. We tested our algorithm on a dataset composed of 694 genome couples and accounted for significant improvements in terms of coverage, while keeping the running times below reasonable limits.

[1]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[2]  M. V. Wilkes,et al.  The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[3]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[4]  Xiaohua Hu,et al.  Average gene length is highly conserved in prokaryotes and eukaryotes and diverges only between the two kingdoms. , 2006, Molecular biology and evolution.

[5]  Marie-France Sagot,et al.  A small trip in the untranquil world of genomes: A survey on the detection and analysis of genome rearrangement breakpoints , 2008, Theor. Comput. Sci..

[6]  Stefan Felsner,et al.  Trapezoid Graphs and Generalizations, Geometry and Algorithms , 1997, Discret. Appl. Math..

[7]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[8]  Tetsuo Shibuya,et al.  Match Chaining Algorithms for cDNA Mapping , 2003, WABI.

[9]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[10]  Eric Rivals,et al.  Improved Sensitivity And Reliability Of Anchor Based Genome Alignment , 2009 .

[11]  Ingmar Reuter,et al.  Integr8 and Genome Reviews: integrated views of complete genomes and proteomes , 2004, Nucleic Acids Res..

[12]  Gregory Kucherov,et al.  YASS: enhancing the sensitivity of DNA similarity search , 2005, Nucleic Acids Res..

[13]  B. Boussau,et al.  Genomes as documents of evolutionary history. , 2010, Trends in ecology & evolution.

[14]  Rino Rappuoli,et al.  Post‐genomic vaccine development , 2006, FEBS letters.

[15]  Marie-Adèle Rajandream,et al.  Comparative genomics of the fungal pathogens Candida dubliniensis and Candida albicans. , 2009, Genome research.

[16]  Eugene W. Myers,et al.  Chaining multiple-alignment fragments in sub-quadratic time , 1995, SODA '95.

[17]  Enno Ohlebusch,et al.  Chaining algorithms for multiple genome comparison , 2005, J. Discrete Algorithms.

[18]  Enno Ohlebusch,et al.  Efficient multiple genome alignment , 2002, ISMB.

[19]  Eric Rivals,et al.  Comparison of minisatellites , 2002, RECOMB '02.