Divide-and-Conquer Algorithm for Clustalw-MPI

Multiple sequence alignment continues to be an active field of research in computational biology and the most widely used tool for multiple sequence alignment is ClustalW, which achieves alignment via three steps: pair wise alignment, guide tree generation and progressive alignment. ClustalW-MPI is a parallel implementation of ClustalW. In this paper, a new approach, divide-and-conquer, is implemented which uses ClustalW-MPI for sequence alignment but it gets a better speed up performance than ClustalW-MPI. In this approach, the sequences are first cut down into smaller subsequences by divide-and-conquer technique to minimize the computational space. Then these subsequences are sent to different available processors using message passing interface technique. Those processors align the subsequences by executing ClustalW-MPI simultaneously. After aligning, the results are then sent to the main processor to be concatenated to produce the final alignment. But some quality of the alignment may be compromised in this approach for the introduction of gaps at the start or end of subsequences aligned. Therefore, some heuristic methods for fixing the cut points were suggested for future improvement, such as overlapping alignment and sliding window alignment

[1]  D G Higgins,et al.  CLUSTAL V: multiple alignment of DNA and protein sequences. , 1994, Methods in molecular biology.

[2]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[3]  R. Sokal,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification. , 1975 .

[4]  S. Rezaei,et al.  Multithreaded Multiple Sequence Alignments , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[5]  D. Higgins,et al.  See Blockindiscussions, Blockinstats, Blockinand Blockinauthor Blockinprofiles Blockinfor Blockinthis Blockinpublication Clustal: Blockina Blockinpackage Blockinfor Blockinperforming Multiple Blockinsequence Blockinalignment Blockinon Blockina Minicomputer Article Blockin Blockinin Blockin , 2022 .

[6]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[7]  Olivier Poch,et al.  A comprehensive comparison of multiple sequence alignment programs , 1999, Nucleic Acids Res..

[8]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[9]  Andreas W. M. Dress,et al.  A Divide and Conquer Approach to Multiple Alignment , 1995, ISMB.

[10]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[11]  Roberto Gomperts,et al.  Performance Optimization of Clustal W : Parallel Clustal W , HT Clustal , and MULTICLUSTAL , 2001 .

[12]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[13]  Rainer Fuchs,et al.  CLUSTAL V: improved software for multiple sequence alignment , 1992, Comput. Appl. Biosci..

[14]  W. Taylor A flexible method to align large numbers of biological sequences , 2005, Journal of Molecular Evolution.

[15]  D. Lipman,et al.  Rapid similarity searches of nucleic acid and protein data banks. , 1983, Proceedings of the National Academy of Sciences of the United States of America.