Hybrid MPI/OpenMP Strategy for Biological Multiple Sequence Alignment with DIALIGN-TX in Heterogeneous Multicore Clusters

Multiple Sequence Alignment (MSA) is a fundamental problem in Bioinformatics that aims to align more than two biological sequences in order to emphasize similarity regions. This problem is known to be NP-Complete, so heuristic methods are used to solve it. DIALIGN-TX is an iterative heuristic method for MSA that is based on dynamic programming and generates alignments by concatenating ungapped regions with high similarity. This paper proposes an MPI/OpenMP master/slave parallel strategy to run DIALIGN-TX in heterogeneous multicore clusters, with multiple allocation policies. The results obtained in a 28-core heterogeneous cluster with real sequence sets show that the execution time can be drastically reduced. Also, we show that an appropriate choice of the allocation policy and the master node has great impact on the overall system performance.

[1]  Jeanette P. Schmidt,et al.  Load-sharing in heterogeneous systems via weighted factoring , 1996, SPAA '96.

[2]  Kuo-Bin Li,et al.  ClustalW-MPI: ClustalW analysis using distributed and parallel computing , 2003, Bioinform..

[3]  Michael Brudno,et al.  The CHAOS/DIALIGN WWW server for multiple alignment of genomic sequences , 2004, Nucleic Acids Res..

[4]  Ninghui Sun,et al.  Parallel multiple sequences alignment in SMP cluster , 2005, Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA'05).

[5]  Surin Kittitornkun,et al.  MT-ClustalW: multithreading multiple sequence alignment , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[6]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[7]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[8]  Srinivas Aluru,et al.  PARALLEL-TCOFFEE: A parallel multiple sequence aligner , 2007, PDCS.

[9]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[10]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[11]  Michael Kaufmann,et al.  DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment , 2008, Algorithms for Molecular Biology.

[12]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[13]  Michael Kaufmann,et al.  DIALIGN P: Fast pair-wise and multiple sequence alignment using parallel processors , 2004, BMC Bioinformatics.

[14]  Burkhard Morgenstern,et al.  DIALIGN2: Improvement of the segment to segment approach to multiple sequence alignment , 1999, German Conference on Bioinformatics.

[15]  Burkhard Morgenstern,et al.  DIALIGN: finding local similarities by multiple sequence alignment , 1998, Bioinform..

[16]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[17]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[18]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .