Efficient computation of motif discovery on Intel Many Integrated Core (MIC) Architecture

BackgroundNovel sequence motifs detection is becoming increasingly essential in computational biology. However, the high computational cost greatly constrains the efficiency of most motif discovery algorithms.ResultsIn this paper, we accelerate MEME algorithm targeted on Intel Many Integrated Core (MIC) Architecture and present a parallel implementation of MEME called MIC-MEME base on hybrid CPU/MIC computing framework. Our method focuses on parallelizing the starting point searching method and improving iteration updating strategy of the algorithm. MIC-MEME has achieved significant speedups of 26.6 for ZOOPS model and 30.2 for OOPS model on average for the overall runtime when benchmarked on the experimental platform with two Xeon Phi 3120 coprocessors.ConclusionsFurthermore, MIC-MEME has been compared with state-of-arts methods and it shows good scalability with respect to dataset size and the number of MICs. Source code: https://github.com/hkwkevin28/MIC-MEME.

[1]  Yongchao Liu,et al.  An Ultrafast Scalable Many-Core Motif Discovery Algorithm for Multiple GPUs , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[2]  Ieva Stupans,et al.  Identification and cloning of two forms of liver peroxisomal fatty Acyl CoA Oxidase from the koala (Phascolarctos cinereus). , 2003, Gene.

[3]  H. K. Dai,et al.  A survey of DNA motif finding algorithms , 2007, BMC Bioinformatics.

[4]  Ke Chen,et al.  Survey of MapReduce frame operation in bioinformatics , 2013, Briefings Bioinform..

[5]  Jijun Tang,et al.  Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information , 2017, Inf. Sci..

[6]  Jack W. Szostak,et al.  An RNA motif that binds ATP , 1993, Nature.

[7]  Bin Li,et al.  Limitations and potentials of current motif discovery algorithms , 2005, Nucleic acids research.

[8]  Gunther Hartmann,et al.  Mechanism and Function of a Newly Identified CpG DNA Motif in Human Primary B Cells1 , 2000, The Journal of Immunology.

[9]  Ying Ju,et al.  Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy , 2016, BMC Systems Biology.

[10]  David M. Heery,et al.  A signature motif in transcriptional co-activators mediates binding to nuclear receptors , 1997, Nature.

[11]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[12]  Ying Xu,et al.  An integrated toolkit for accurate prediction and analysis of cis-regulatory motifs at a genome scale , 2013, Bioinform..

[13]  Jim Jeffers,et al.  Chapter 10 – Linux on the Coprocessor , 2013 .

[14]  Yongchao Liu,et al.  CUDA-MEME: Accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units , 2010, Pattern Recognit. Lett..

[15]  Brendan J. Frey,et al.  A compendium of RNA-binding motifs for decoding gene regulation , 2013, Nature.

[16]  Tarek ElDeeb,et al.  Massively Parallelized DNA Motif Search on FPGA , 2011 .

[17]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[18]  Yen-Chen Liu,et al.  Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.

[19]  William Noble Grundy,et al.  ParaMEME: a parallel implementation and a web interface for a DNA and protein motif discovery tool , 1996, Comput. Appl. Biosci..

[20]  A. A. Reilly,et al.  An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences , 1990, Proteins.

[21]  James Reinders,et al.  Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[22]  Qing Zhang,et al.  High-Performance Computing on the Intel® Xeon Phi™ , 2014, Springer International Publishing.

[23]  Serita M. Nelesen,et al.  SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees. , 2012, Systematic biology.

[24]  Vladimir B. Bajic,et al.  Highly scalable ab initio genomic motif identification , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[25]  Xiaohui Xie,et al.  EXTREME: an online EM algorithm for motif discovery , 2014, Bioinform..

[26]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[27]  Wilfred W. Li,et al.  MEME: discovering and analyzing DNA and protein sequence motifs , 2006, Nucleic Acids Res..