A high-throughput gene sequence alignment strategy using parallel computing

Sequence alignment has been an important part of biological big data computing. There are a number of platforms to provide sequence alignment services currently in the world. They have different characteristics in various aspects, such as speed, accuracy, limitation, etc. In this article we propose a strategy for gene sequence alignment based on parallel computing and an algorithm for gene sequence alignment which has higher accuracy and more suitability than most others is introduced and improved to ensure the accuracy and efficiency of calculation. Furthermore, with parallel computing, it can use more computing resources so that it has faster computing speed. A test was done with analyzing data from National Center of Biotechnology Information (NCBI) genome database. The result shows that the strategy can accomplish calculation task in general computing resource allocation environment, the time-consuming is in the range of 31% to 36% of original method in the case of large calculation amount. It will provide fundamental technology for sequence alignment application of cloud computing platform.