On accelerating iterative algorithms with CUDA: A case study on Conditional Random Fields training algorithm for biological sequence alignment

The accuracy of Conditional Random Fields (CRF) is achieved at the cost of huge amount of computation to train model. In this paper we designed the parallelized algorithm for the Gradient Ascent based CRF training methods for biological sequence alignment. Our contribution is mainly on two aspects: 1) We flexibly parallelized the different iterative computation patterns, and the according optimization methods are presented. 2) As for the Gibbs Sampling based training method, we designed a way to automatically predict the iteration round, so that the parallel algorithm could be run in a more efficient manner. In the experiment, these parallel algorithms achieved valuable accelerations comparing to the serial version.

[1]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[2]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[3]  Graciela Gonzalez,et al.  BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[4]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[5]  Folker Meyer,et al.  Rose: generating sequence families , 1998, Bioinform..

[6]  Yang Liu,et al.  GPU Accelerated Smith-Waterman , 2006, International Conference on Computational Science.

[7]  Nikil D. Dutt,et al.  A configurable simulation environment for the efficient simulation of large-scale spiking neural networks on graphics processors , 2009, Neural Networks.

[8]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[9]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  Charles Elkan Log-linear models and conditional random fields , 2007 .

[12]  P. Gehler,et al.  An introduction to graphical models , 2001 .

[13]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[14]  Siddhartha Chatterjee,et al.  Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming , 2008, PPOPP.

[15]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[16]  Pat Hanrahan,et al.  ClawHMMER: A Streaming HMMer-Search Implementation , 2005, SC.

[17]  Roman Klinger,et al.  Classical Probabilistic Models and Conditional Random Fields , 2007 .

[18]  C. Notredame,et al.  Recent progress in multiple sequence alignment: a survey. , 2002, Pharmacogenomics.

[19]  David A. Bader,et al.  A tile-based parallel Viterbi algorithm for biological sequence alignment on GPU with CUDA , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[20]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[21]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.

[22]  Jiulong Shan,et al.  Parallel Information Extraction on Shared Memory Multi-processor System , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[23]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[24]  Vivek K. Pallipuram,et al.  Acceleration of spiking neural networks in emerging multi-core and GPU architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[25]  Fumihiko Ino,et al.  Design and implementation of the Smith-Waterman algorithm on the CUDA-compatible GPU , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[26]  Jorja G. Henikoff,et al.  Using substitution probabilities to improve position-specific scoring matrices , 1996, Comput. Appl. Biosci..

[27]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[28]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[29]  Pat Hanrahan,et al.  ClawHMMER: A Streaming HMMer-Search Implementatio , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[30]  Serafim Batzoglou,et al.  CONTRAlign: Discriminative Training for Protein Sequence Alignment , 2006, RECOMB.