Fast Matching of CBG Patterns with Applications to Protein Matching

The large data set sizes produced in many biological applications, makes pattern matching in computational biology a challenge. We present a technique for pattern matching an important class of protein patterns. We show how such a protein pattern can be represented as a logical expression, from which a circuit can be automatically synthesised, and implemented on field programmable gate arrays, which leads to highly parallelisable solutions. The method was tested on the Prosite database, and almost all the patterns could be dealt with very efficiently leading to throughput rates in most cases excess of 108 symbols per second.

[1]  Randal E. Bryant,et al.  Symbolic Boolean manipulation with ordered binary-decision diagrams , 1992, CSUR.

[2]  Gonzalo Navarro,et al.  Fast and simple character classes and bounded gaps pattern matching, with application to protein searching , 2001, RECOMB.

[3]  Moshe Sipper,et al.  Static and Dynamic Configurable Systems , 1999, IEEE Trans. Computers.

[4]  Scott Hauck,et al.  The roles of FPGAs in reprogrammable systems , 1998, Proc. IEEE.

[5]  Scott Hazelhurst,et al.  Algorithms for improving the dependability of firewall and filter rule lists , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[6]  Gonzalo Navarro,et al.  Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching , 2003, J. Comput. Biol..

[7]  Rolf Drechsler,et al.  Look-up table FPGA synthesis from minimized multi-valued pseudo Kronecker expressions , 1998, Proceedings. 1998 28th IEEE International Symposium on Multiple- Valued Logic (Cat. No.98CB36138).

[8]  Rolf Drechsler,et al.  MDD-based synthesis of multi-valued logic networks , 2000, Proceedings 30th IEEE International Symposium on Multiple-Valued Logic (ISMVL 2000).