Quasi-Optimal Multiple Sequence Alignments

Finding an optimal multiple sequence alignment (MSA) of three or more nucleic acid or amino acid sequences is a fundamental problem of bioinformatics with a large number of publications and citations over the last 30 years. Given a set of sequences, an optimal MSA identifies homologous characters, which have common ancestry. The resulting MSA is used for many downstream applications in medical and health informatics such as constructing phylogenetic trees, finding protein families, predicting secondary and tertiary structure of new sequences, and demonstrating the homology between new sequences and existing families. Unfortunately, techniques that work well for pairwise alignment often become too computationally expensive when they are applied to multiple sequence alignment due the extremely large size of the search space. In fact, it is common for multi- ple sequence alignment problems to become computationally intractable. This is because multiple sequence alignment is a combinatorial problem, and as the number or size of the sequences in the problem set increases, the computational time required to perform an alignment increases exponentially. That is, for n sequences of length l, computing the optimal alignment exactly carries a computational complexity of O(ln). Thus, dynamic programming techniques such as the Needleman-Wunsch algorithm are guaranteed to produce optimal solutions to multiple sequence alignment problems, but are generally impractical for all but the smallest examples. In fact, multiple sequence alignment algorithms using the sum- of-pairs heuristic is NP-complete. As a result, most currently- employed multiple sequence alignment algorithms are based on heuristics and must settle for providing a quasi-optimal alignment.

[1]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[2]  Pedro F. Rodriguez,et al.  Multiple sequence alignment using swarm intelligence , 2007 .

[3]  Fabian Sievers,et al.  Clustal Omega, accurate alignment of very large numbers of sequences. , 2014, Methods in molecular biology.

[4]  Yuxi Gao A Multiple Sequence Alignment Algorithm Based on Inertia Weights Particle Swarm Optimization , 2014 .

[5]  M S Waterman,et al.  Multiple sequence alignment by consensus. , 1986, Nucleic acids research.

[6]  Leandro Alves Neves,et al.  Improvements in the sensibility of MSA-GA tool using COFFEE objective function , 2015 .

[7]  Héctor Pomares,et al.  Comparing different machine learning and mathematical regression models to evaluate multiple sequence alignments , 2015, Neurocomputing.

[8]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[9]  Thomas Kiel Rasmussen,et al.  Improved Hidden Markov Model training for multiple sequence alignment by a particle swarm optimization-evolutionary algorithm hybrid. , 2003, Bio Systems.

[10]  Quoc-Nam Tran,et al.  UPS: A new approach for multiple sequence alignment using morphing techniques , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[11]  Amit Konar,et al.  Swarm Intelligence Algorithms in Bioinformatics , 2008, Computational Intelligence in Bioinformatics.

[12]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[13]  Liisa Holm,et al.  COFFEE: an objective function for multiple sequence alignments , 1998, Bioinform..

[14]  S. Altschul,et al.  Optimal sequence alignment using affine gap costs. , 1986, Bulletin of mathematical biology.