Developing a Protein Interaction Prediction Algorithm on HPC

The prediction of protein-protein interaction is one of the fundamental problems in bioinformatics. A novel algorithm called STRIKE has shown to achieve good performance in protein-protein interaction prediction. It assumes that proteins interact if they contain similar substrings of amino acids. In this paper, we developed a parallel STRIKE algorithm and we implemented our proposal on Cluster system. Using short protein sequence sets, the overall execution time of a parallel implementation of this bioinformatics algorithm was decreased to about 5 times when increasing number of nodes from one compute node to 6 parallel nodes. Key optimizations to the implementation are also discussed.

[1]  Safaai Deris,et al.  Application of String Kernels in Protein Sequence Classification , 2005, Applied bioinformatics.

[2]  Nazar Zaki,et al.  Parallel protein sequence matching on multicore computers , 2010, 2010 International Conference of Soft Computing and Pattern Recognition.

[3]  E. Sprinzak,et al.  Correlated sequence-signatures as markers of protein-protein interaction. , 2001, Journal of molecular biology.

[4]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[5]  C. Watkins Dynamic Alignment Kernels , 1999 .

[6]  Bernhard Schölkopf,et al.  Dynamic Alignment Kernels , 2000 .

[7]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[8]  Michael Y. Galperin,et al.  Sequence ― Evolution ― Function: Computational Approaches in Comparative Genomics , 2010 .

[9]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[10]  Nazar Zaki,et al.  Protein-protein interaction based on pairwise similarity , 2009, BMC Bioinformatics.

[11]  Cheng-Yan Kao,et al.  POINT: a database for the prediction of protein-protein interactions based on the orthologous interactome , 2004, Bioinform..

[12]  Nazar Zaki,et al.  Protein-Protein Interaction Prediction Using Homology and Inter-domain Linker Region Information , 2008, World Congress on Engineering.

[13]  Albert Chan,et al.  PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs , 2006, BMC Bioinformatics.