prace-ri . eu Partnership for Advanced Computing in Europe Optimization of Multiple Sequence Alignment Software ClustalW

* Corresponding author. E-mail address: sko@nsc.liu.se ‡ Corresponding author. E-mail address: pborovska@tu-sofia.bg † Corresponding author. E-mail address: vgan@tu-sofia.bg Abstract This activity with the project PRACE-2IP is aimed to investigate and improve the performance of multiple sequence alignment software ClustalW on the supercomputer BlueGene/Q, so-called JUQUEEN, for the case study of the influenza virus sequences. Porting, tuning, profiling, and scaling of this code has been accomplished in this aspect. A parallel I/O interface has been designed for effcient sequence dataset input, in which sub-groups' local masters take care of read operation and broadcast the dataset to their slaves. The optimal group size has been investigated and the effects of read buffer size on read performance has been experimented. The application to ClustalW software shows that the current implementation with parallel I/O provides considerably better performance than the original code in view of I/O segment, leading up to 6.8 times speed-up for inputting dataset in case of using 8192 JUQUEEN cores.