Improvements on bicriteria pairwise sequence alignment: algorithms and applications

MOTIVATION In this article, we consider the bicriteria pairwise sequence alignment problem and propose extensions of dynamic programming algorithms for several problem variants with a novel pruning technique that efficiently reduces the number of states to be processed. Moreover, we present a method for the construction of phylogenetic trees based on this bicriteria framework. Two exemplary cases are discussed. RESULTS Numerical results on a real dataset show that this approach is very fast in practice. The pruning technique saves up to 90% in memory usage and 80% in CPU time. Based on this method, phylogenetic trees are constructed from real-life data. In addition of providing complementary information, some of these trees match those obtained by the Maximum Likelihood method. AVAILABILITY AND IMPLEMENTATION Source code is freely available for download at URL http://eden.dei.uc.pt/paquete/MOSAL, implemented in C and supported on Linux, MAC OS and MS Windows.

[1]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[2]  Dan Gusfield,et al.  Parametric optimization of sequence alignment , 1992, SODA '92.

[3]  Ralph E. Steuer Multiple criteria optimization , 1986 .

[4]  Roĭtberg Ma,et al.  [Pareto-optimal alignment of biological sequences]. , 1999 .

[5]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[6]  Mikhail A. Roytberg,et al.  Fast Algorithm for Optimal Aligning of Symbol Sequences , 1992, Mathematical Methods Of Analysis Of Biopolymer Sequences.

[7]  René Beier,et al.  The Knapsack Problem , 2011, Algorithms Unplugged.

[8]  Matthias Ehrgott,et al.  Multicriteria Optimization , 2005 .

[9]  Luís Paquete,et al.  Experiments with Bicriteria Sequence Alignment , 2009 .

[10]  Akito Taneda Multi-objective pairwise RNA sequence alignment , 2010, Bioinform..

[11]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[12]  Manuel A. S. Santos,et al.  Evolution of pathogenicity and sexual reproduction in eight Candida genomes , 2009, Nature.

[13]  Yi Peng,et al.  Cutting-Edge Research Topics on Multiple Criteria Decision Making , 2009 .

[14]  R. S. Laundy,et al.  Multiple Criteria Optimisation: Theory, Computation and Application , 1989 .

[15]  Christian Scheideler,et al.  Algorithms Unplugged , 2011, Algorithms Unplugged.

[16]  Douglas B. Kell,et al.  Multiobjective Optimization in Bioinformatics and Computational Biology , 2007, IEEE ACM Trans. Comput. Biol. Bioinform..

[17]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[18]  Olivier Poch,et al.  BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark , 2005, Proteins.

[19]  John J. Bartholdi,et al.  The Knapsack Problem , 2008 .

[20]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[21]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[22]  M. Nei,et al.  MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. , 2011, Molecular biology and evolution.

[23]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[24]  Randall T. Schuh,et al.  Biological Systematics: Principles and Applications , 1999 .