Algorithms for Molecular Biology

Background: We present the N-map method, a pairwise and asymmetrical approach which allows us to compare sequences by taking into account evolutionary events that produce shuffled, reversed or repeated elements. Basically, the optimal N-map of a sequence s over a sequence t is the best way of partitioning the first sequence into N parts and placing them, possibly complementary reversed, over the second sequence in order to maximize the sum of their gapless alignment scores. Results: We introduce an algorithm computing an optimal N-map with time complexity O (|s| × |t| × N) using O (|s| × |t| × N) memory space. Among all the numbers of parts taken in a reasonable range, we select the value N for which the optimal N-map has the most significant score. To evaluate this significance, we study the empirical distributions of the scores of optimal N-maps and show that they can be approximated by normal distributions with a reasonable accuracy. We test the functionality of the approach over random sequences on which we apply artificial evolutionary events. Practical Application: The method is illustrated with four case studies of pairs of sequences involving non-standard evolutionary events.

[1]  D. Sankoff,et al.  Comparative Genomics: "Empirical And Analytical Approaches To Gene Order Dynamics, Map Alignment And The Evolution Of Gene Families" , 2000 .

[2]  Ron Unger,et al.  Swaps in protein sequences , 2002, Proteins.

[3]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Kaizhong Zhang,et al.  Algorithmic approaches for genome rearrangement: a review , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[5]  H. Quesneville,et al.  Recurrent exon shuffling between distant P-element families. , 2003, Molecular biology and evolution.

[6]  Inna Dubchak,et al.  Glocal alignment: finding rearrangements during alignment , 2003, ISMB.

[7]  Jean-Paul Delahaye,et al.  Transformation distances: a family of dissimilarity measures based on movements of segments , 1999, Bioinform..

[8]  Eric Rivals,et al.  Comparison of Minisatellites , 2003, J. Comput. Biol..

[9]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[10]  Benjamin J. Raphael,et al.  A novel method for multiple alignment of sequences with repeated and shuffled elements. , 2004, Genome research.

[11]  D Sankoff,et al.  Matching sequences under deletion-insertion constraints. , 1972, Proceedings of the National Academy of Sciences of the United States of America.

[12]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[13]  Niklas Eriksen,et al.  Measuring Genome Divergence in Bacteria: A Case Study Using Chlamydian Data , 2002, Journal of Molecular Evolution.

[14]  Jean-Marc Steyaert,et al.  On the Transformation Distance Problem , 2004, SPIRE.