Optimal Distributed Multiple Sequence Alignment Using Conformal Computing Methods

Multiple sequence alignment (MSA) is a very common bioinformatics technique used in biological and medical research, to study the function, structure and evolution of genes and proteins. The algorithm for the optimal solution to the MSA problem is well-understood, but cannot be implemented even on high-performance computers since it cannot be easily distributed across multiple processors. We are redesigning the optimal MSA method to facilitate its deployment on supercomputers. This will allow highperformance and distributed computing platforms, which are becoming more prevalent in biological research, to be harnessed for the calculation of reference alignments for genes and protein sequences, and also for the identification of sequence regions in common in a group of sequences (multiple local sequence alignment) The exponential growth in time and memory requirements were found to be compensated by exponential parallelism, using the proposed partitioning scheme, and optimizing the communication cost.