Fine-grained parallel RNA secondary structure prediction using SCFGs on FPGA

In the field of RNA secondary structure prediction, the CYK (Coche-Younger-Kasami) algorithm is one of the most popular methods using a SCFG (stochastic context-free grammar) model. Accelerating SCFGs for large models and large RNA database searching becomes a challenging task in computational bioinformatics because the parallel efficiency of general purpose computer systems is limited by the O (L^3) computational complexity and by complicated data dependences. Furthermore, large scale parallel computers are too expensive to be easily accessible to many research institutes. Recently, FPGA chips have emerged as one promising application accelerator to accelerate the CYK algorithm by exploiting a fine-grained custom design. We propose a systolic-like array structure including one master PE and multiple slave PEs for the fine-grained hardware implementation on FPGA to accelerate the CYK/inside algorithm with Query-Dependent Banding (QDB) heuristics. We partition the tasks by columns and assign them to PEs for load balance. We exploit data reuse schemes to reduce the need to load matrices from external memory. The experimental results show a speedup factor of more than 14x over the Infernal-1.0 with QDB optimization for the alignment of a single long RNA sequence to a large CM model with thousands of states running on a PC platform with Intel Dual-core 2.5GHz CPU. The computational power of our accelerator is comparable to that of a PC cluster consisting of 16 Intel-Xeon 2.0GHz Quad CPUs for large-scale database alignment applications (cmsearch) with multiple input sequences, but the power consumption is only about 10% of that of the cluster.

[1]  Tong Liu,et al.  Parallel RNA secondary structure prediction using stochastic context‐free grammars , 2005, Concurr. Comput. Pract. Exp..

[2]  Yong Dou,et al.  Fine-grained parallel application specific computing for RNA secondary structure prediction using SCFGS on FPGA , 2009, CASES '09.

[3]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[4]  Xuejun Yang,et al.  Fine-grained parallel application specific computing for RNA secondary structure prediction on FPGA , 2008, ICCD.

[5]  Sean R. Eddy,et al.  A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure , 2002, BMC Bioinformatics.

[6]  Andrew Wayne,et al.  A CYK approach to parsing in parallel: a case study , 1991, SIGCSE '91.

[7]  Shengzhong Feng,et al.  Exploiting Parallelization for RNA Secondary Structure Prediction in Cluster , 2005, International Conference on Computational Science.

[8]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[9]  Jean-Cédric Chappelier,et al.  An FPGA-based coprocessor for the parsing of context-free grammars , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[10]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[11]  G. Stormo,et al.  Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. , 1992, Nucleic acids research.

[12]  Jean-Cédric Chappelier,et al.  An FPGA-Based Syntactic Parser for Real-Life Almost Unrestricted Context-Free Grammars , 2001, FPL.

[13]  Lin Xu,et al.  An experimental study of optimizing bioinformatics applications , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[14]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[15]  Roger D. Chamberlain,et al.  Accelerating Nussinov RNA secondary structure prediction with systolic arrays on FPGAs , 2008, 2008 International Conference on Application-Specific Systems, Architectures and Processors.

[16]  Young H. Cho,et al.  Hardware-Accelerated RNA Secondary-Structure Alignment , 2010, TRETS.

[17]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[18]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[19]  Sean R. Eddy,et al.  Query-Dependent Banding (QDB) for Faster RNA Similarity Searches , 2007, PLoS Comput. Biol..

[20]  Jerrold R. Griggs,et al.  Algorithms for Loop Matchings , 1978 .

[21]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[22]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.

[23]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[25]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .