Load Balancing and Parallel Multiple Sequence Alignment with Tree Accumulation

Multiple sequence alignment program, ClustalW, is time consuming, however, commonly used to compare the protein sequences. ClustalW includes two main time consuming parts: pairwise alignment and progressive alignment. Due to the irregular computation based on tree in progressive alignment, available parallel programs can not achieve reasonable speedups for large scale number of sequences. In this paper, progressive alignment is reduced to tree accumulation problem. Load balancing is ignored in previous efficient parallel tree accumulations. We proposed a load balancing strategy for parallelizing tree accumulation in progressive alignment. The new parallel progressive alignment algorithm reducing to tree accumulation with load balancing reduced the overall running time greatly and achieved reasonable speedups.