A Parallel Algorithm for Large-Scale Multiple Sequence Alignment

Multiple sequence alignment is a central topic of extensive research in computational biology. Basically, two or more protein sequences are compared to evaluate their similarity and to identify conserved regions. This work reports a methodology for parallel processing of a multiple sequence alignment algorithm (ClustalW) in an environment of networked computers. A detailed description of the modules that compose the distributed system is provided, giving special attention to the way a dynamic programming algorithm is run in multilevel parallelism. Extensive experiments were done to evaluate performance and scalability of the reported method. Results suggest that the proposed method is very promising for large-scale multiple protein sequence alignment.

[1]  Burkhard Morgenstern,et al.  Speeding Up the DIALIGN Multiple Alignment Program by Using the 'Greedy Alignment of BIOlogical Sequences LIBrary' (GABIOS-LIB) , 2000, JOBIM.

[2]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[3]  Hiroki Arimura,et al.  On approximation algorithms for local multiple alignment , 2000, RECOMB '00.

[4]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Heitor Silvério Lopes,et al.  Multiple Sequence Alignment Using Reconfigurable Computing , 2007, ARC.

[6]  Heitor Silvério Lopes,et al.  A Graph-Based Genetic Algorithm for the Multiple Sequence Alignment Problem , 2006, ICAISC.

[7]  Edson Cáceres,et al.  A Parallel Wavefront Algorithm for Efficient Biological Sequence Comparison , 2003, ICCSA.

[8]  Srinivas Aluru,et al.  Parallel biological sequence comparison using prefix computations , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[9]  Winfried Just,et al.  Computational Complexity of Multiple Sequence Alignment with SP-Score , 2001, J. Comput. Biol..

[10]  Jack Dongarra,et al.  Sourcebook of parallel computing , 2003 .

[11]  D. Higgins,et al.  See Blockindiscussions, Blockinstats, Blockinand Blockinauthor Blockinprofiles Blockinfor Blockinthis Blockinpublication Clustal: Blockina Blockinpackage Blockinfor Blockinperforming Multiple Blockinsequence Blockinalignment Blockinon Blockina Minicomputer Article Blockin Blockinin Blockin , 2022 .

[12]  J. Valverde Molecular Modelling: Principles and Applications , 2001 .

[13]  T. Jukes,et al.  The neutral theory of molecular evolution. , 2000, Genetics.

[14]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[15]  Min Zhang,et al.  MSAID: multiple sequence alignment based on a measure of information discrepancy , 2005, Comput. Biol. Chem..

[16]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[17]  C. Notredame,et al.  Recent progress in multiple sequence alignment: a survey. , 2002, Pharmacogenomics.

[18]  Heitor Silvério Lopes,et al.  A Configware Approach for High-Speed Parallel Analysis of genomic Data , 2007, J. Circuits Syst. Comput..

[19]  Joel H. Saltz,et al.  Parallel processing of biological sequence comparison algorithms , 1988, International Journal of Parallel Programming.

[20]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..