Review of Bioinformatics Application Using Intel MIC

As the next-generation sequencing (NGS) technology develops rapidly recently, the ever-increasing biological sequence data poses a tremendous challenge to data processing. Speeding up data analyzing using intensive computing power is a hot topic. Among the state-of-the-art parallel accelerators, Intel Xeon Phi coprocessor is a bootable host processor based on Intel Many Integrated Core (MIC) architecture that provides massive parallelism and vectorization to support the most demanding high-performance computing (HPC) applications. The underlying x86 architecture supports common parallel programming standard libraries that provide familiarity and flexibility to transplant existing code to heterogeneous computing environments. In addition, it delivers three usage model including native, offload and symmetric models to solve different application problems on the MIC-based neo-heterogeneous architectures. Currently, Intel Xeon Phi is becoming a widely-used parallel computing platform for decreasing the computational cost of the most demanding processes in bioinformatics. To help researchers make better use of MIC, we summarize the MIC-based bioinformatics applications involving genomics, proteomics, pharmacology, phylogenetics and epigenetics. We believe that this review provides a comprehensive guideline for bioinformatics researchers to apply MIC in their own fields.

[1]  Alan Bridge,et al.  pfsearchV3: a code acceleration and heuristic to search PROSITE profiles , 2013, Bioinform..

[2]  Alan Bridge,et al.  New and continuing developments at PROSITE , 2012, Nucleic Acids Res..

[3]  Yongchao Liu,et al.  CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions , 2010, BMC Research Notes.

[4]  Carson Kai-Sang Leung,et al.  Uncertain Frequent Pattern Mining , 2014, Frequent Pattern Mining.

[5]  Philip Ross,et al.  Why CPU Frequency Stalled , 2008, IEEE Spectrum.

[6]  Juana Moreno,et al.  Accelerating the Pace of Protein Functional Annotation With Intel Xeon Phi Coprocessors , 2015, IEEE Transactions on NanoBioscience.

[7]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[8]  Amos Bairoch,et al.  PROSITE, a protein domain database for functional characterization and annotation , 2009, Nucleic Acids Res..

[9]  Francisco José Esteban,et al.  Speeding-up Bioinformatics Algorithms with Heterogeneous Architectures: Highly Heterogeneous Smith-Waterman (HHeterSW) , 2016, J. Comput. Biol..

[10]  Michael P. H. Stumpf,et al.  GPU accelerated biochemical network simulation , 2011, Bioinform..

[11]  Solon P. Pissis,et al.  MoTeX: A word-based HPC tool for MoTif eXtraction , 2013, BCB.

[12]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[13]  Jason H. Moore,et al.  Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS , 2010, Bioinform..

[14]  Kenli Li,et al.  MIC-Tandem: Parallel X!Tandem Using MIC on Tandem Mass Spectrometry Based Proteomics Data , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[15]  Weiguo Liu,et al.  XSW: Accelerating Biological Database Search on Xeon Phi , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[16]  Huanming Yang,et al.  SNP detection for massively parallel whole-genome resequencing. , 2009, Genome research.

[17]  Xiangke Liao,et al.  mSNP: A Massively Parallel Algorithm for Large-Scale SNP Detection , 2018, IEEE Transactions on Parallel and Distributed Systems.

[18]  James Reinders,et al.  Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[19]  Torbjørn Rognes,et al.  Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation , 2011, BMC Bioinformatics.

[20]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[21]  R. Beavis,et al.  A method for reducing the time required to match protein sequences with tandem mass spectra. , 2003, Rapid communications in mass spectrometry : RCM.

[22]  Weiguo Liu,et al.  XPFS: A new parallel PROSITE profile search algorithm on Xeon Phi , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[23]  S. Adee,et al.  the data: 37 Years of Moore's Law , 2008, IEEE Spectrum.

[24]  Vineet Bafna,et al.  SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database , 2001, ISMB.

[25]  Solon P. Pissis,et al.  Accelerating String Matching on MIC Architecture for Motif Extraction , 2013, PPAM.

[26]  B. Schölkopf,et al.  GLIDE: GPU-Based Linear Regression for Detection of Epistasis , 2012, Human Heredity.

[27]  Yongchao Liu,et al.  SWhybrid: A Hybrid-Parallel Framework for Large-Scale Protein Sequence Database Search , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[28]  Yongchao Liu,et al.  CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions , 2013, BMC Bioinformatics.

[29]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[30]  Masao Ueki,et al.  Ultrahigh-dimensional variable selection method for whole-genome gene-gene interaction analysis , 2012, BMC Bioinformatics.

[31]  Karsten M. Borgwardt,et al.  EPIBLASTER-fast exhaustive two-locus epistasis detection strategy using graphical processing units , 2011, European Journal of Human Genetics.

[32]  H. K. Dai,et al.  A survey of DNA motif finding algorithms , 2007, BMC Bioinformatics.

[33]  Li Ma,et al.  High-performance epistasis detection in quantitative trait GWAS , 2018, Int. J. High Perform. Comput. Appl..

[34]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.