Accelerating the Pace of Protein Functional Annotation With Intel Xeon Phi Coprocessors

Intel Xeon Phi is a new addition to the family of powerful parallel accelerators. The range of its potential applications in computationally driven research is broad; however, at present, the repository of scientific codes is still relatively limited. In this study, we describe the development and benchmarking of a parallel version of eFindSite, a structural bioinformatics algorithm for the prediction of ligand-binding sites in proteins. Implemented for the Intel Xeon Phi platform, the parallelization of the structure alignment portion of eFindSite using pragma-based OpenMP brings about the desired performance improvements, which scale well with the number of computing cores. Compared to a serial version, the parallel code runs 11.8 and 10.1 times faster on the CPU and the coprocessor, respectively; when both resources are utilized simultaneously, the speedup is 17.6. For example, ligand-binding predictions for 501 benchmarking proteins are completed in 2.1 hours on a single Stampede node equipped with the Intel Xeon Phi card compared to 3.1 hours without the accelerator and 36.8 hours required by a serial version. In addition to the satisfactory parallel performance, porting existing scientific codes to the Intel Xeon Phi architecture is relatively straightforward with a short development time due to the support of common parallel programming models by the coprocessor. The parallel version of eFindSite is freely available to the academic community at www.brylinski.org/efindsite.

[1]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[2]  Andrey Vladimirov and Vadim Karpusenko Heterogeneous Clustering with Homogeneous Code: Accelerate MPI Applications Without Code Surgery Using Intel Xeon Phi Coprocessors , 2013 .

[3]  Martin Zacharias,et al.  In silico prediction of binding sites on proteins. , 2010, Current medicinal chemistry.

[4]  Chi-Ren Shyu,et al.  Accelerating large-scale protein structure alignments with graphics processing units , 2012, BMC Research Notes.

[5]  Martin Weigel,et al.  Simulating spin models on GPU , 2010, Comput. Phys. Commun..

[6]  Michal Brylinski,et al.  eFindSite: Improved prediction of ligand binding sites in protein models using meta-threading, machine learning and auxiliary ligands , 2013, Journal of Computer-Aided Molecular Design.

[7]  Klaus Schulten,et al.  Accelerating Molecular Modeling Applications with GPU Computing , 2009 .

[8]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[9]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[10]  Xiaoguang Liu,et al.  Efficient Implementation of MrBayes on Multi-GPU , 2013, Molecular biology and evolution.

[11]  Michal Brylinski,et al.  eFindSite: Enhanced Fingerprint‐Based Virtual Screening Against Predicted Ligand Binding Sites in Protein Models , 2014, Molecular informatics.

[12]  Michal Brylinski,et al.  Nonlinear Scoring Functions for Similarity-Based Ligand Docking and Binding Affinity Prediction , 2013, J. Chem. Inf. Model..

[13]  Jeffrey Skolnick,et al.  Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score , 2008, BMC Bioinformatics.

[14]  Jun Kong,et al.  Comparative Performance Analysis of Intel (R) Xeon Phi (TM), GPU, and CPU: A Case Study from Microscopy Image Analysis , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[15]  Michael Klemm,et al.  OpenMP Programming on Intel Xeon Phi Coprocessors: An Early Performance Comparison , 2012, MARC@RWTH.

[16]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[17]  C. Rosales,et al.  Porting to the Intel Xeon Phi: Opportunities and Challenges , 2013, 2013 Extreme Scaling Workshop (xsw 2013).

[18]  Nikolaos V. Sahinidis,et al.  GPU-BLAST: using graphics processors to accelerate protein sequence alignment , 2010, Bioinform..

[19]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[20]  C. Orengo,et al.  Protein function annotation by homology-based inference , 2009, Genome Biology.

[21]  M. Januszewski,et al.  Accelerating numerical solution of stochastic differential equations with CUDA , 2009, Comput. Phys. Commun..

[22]  R. Russell,et al.  Structural systems biology: modelling protein interactions , 2006, Nature Reviews Molecular Cell Biology.

[23]  Mitsuo Gen,et al.  Genetic algorithms and engineering optimization , 1999 .

[24]  John E. Stone,et al.  Long time-scale simulations of in vivo diffusion using GPU hardware , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[25]  Michal Brylinski,et al.  Q‐DockLHM: Low‐resolution refinement for ligand comparative modeling , 2009, J. Comput. Chem..

[26]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[27]  Michal Brylinski,et al.  FINDSITE: a combined evolution/structure-based approach to protein function prediction , 2009, Briefings Bioinform..

[28]  M. Tress,et al.  Sequence-based feature prediction and annotation of proteins , 2009, Genome Biology.

[29]  Michal Brylinski,et al.  Unleashing the power of meta-threading for evolution/structure-based function inference of proteins , 2013, Front. Genet..

[30]  R. Iyengar,et al.  Systems approaches to polypharmacology and drug discovery. , 2010, Current opinion in drug discovery & development.

[31]  James Reinders,et al.  High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches , 2014 .

[32]  Jianhua Zhao,et al.  Advances in whole genome sequencing technology. , 2011, Current pharmaceutical biotechnology.

[33]  M. Brylinski,et al.  eThread: A Highly Optimized Machine Learning-Based Approach to Meta-Threading and the Modeling of Protein Tertiary Structures , 2012, PloS one.

[34]  J Richard Miller,et al.  Structural basis for effectiveness of siderophore-conjugated monocarbams against clinically relevant strains of Pseudomonas aeruginosa , 2010, Proceedings of the National Academy of Sciences.

[35]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[36]  J. Skolnick,et al.  A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation , 2008, Proceedings of the National Academy of Sciences.

[37]  Hongyi Zhou,et al.  PSiFR: an integrated resource for prediction of protein structure and function , 2010, Bioinform..