A Configware Approach for High-Speed Parallel Analysis of genomic Data

Many problems in bioinformatics represent great computational challenges due to the huge amount of biological data to be analyzed. Reconfigurable systems can offer custom-computing machines, with orders of magnitude faster than regular software, running in general-purpose processors. We present a methodology for using a configware system in an interesting problem of molecular biology: the splice junction detection in eukaryote genes. Decision trees were developed using a benchmark of DNA sequences. They were converted into logical equations, simplified, and submitted to a Boolean minimization. The resulting circuit was implemented in reconfigurable parallel hardware and evaluated with a five-fold cross-validation procedure, run in a second level of parallelism. The average accuracy achieved was 90.41% and it takes 18 ns to process each data record with 60 nucleotides.

[1]  Jude W. Shavlik,et al.  Training Knowledge-Based Neural Networks to Recognize Genes , 1990, NIPS.

[2]  Bertil Schmidt,et al.  Using reconfigurable hardware to accelerate multiple sequence alignment with ClustalW , 2005, Bioinform..

[3]  L. Carro,et al.  A comparison of microcontrollers targeted to FPGA-based embedded applications , 2000, Proceedings 13th Symposium on Integrated Circuits and Systems Design (Cat. No.PR00843).

[4]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[5]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[6]  Ellis J. Neufeld,et al.  The gene mutated in thiamine-responsive anaemia with diabetes and deafness (TRMA) encodes a functional thiamine transporter , 1999, Nature Genetics.

[7]  Jürgen Becker,et al.  Configware and morphware going mainstream , 2003, J. Syst. Archit..

[8]  D. A. Harris,et al.  Principles of Biochemistry (2nd edn) , 1993 .

[9]  Alberto Sangiovanni-Vincentelli,et al.  Exact Minimization of Multiple-Valued Functions for PLA Optimization , 2003 .

[10]  Scott Hauck,et al.  The Roles of FPGA's in Reprogrammable Systems , 1998 .

[11]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[12]  Vittorio Rosato,et al.  Designing hardware for protein sequence analysis , 2003, Bioinform..

[13]  Scott Hauck,et al.  The roles of FPGAs in reprogrammable systems , 1998, Proc. IEEE.

[14]  Christopher Hoover,et al.  Hardware and software systems for accelerating common bioinformatics sequence analysis algorithms , 2004 .

[15]  Akihiko Konagaya,et al.  High Speed Homology Search with FPGAs , 2001, Pacific Symposium on Biocomputing.

[16]  Eduardo Sanchez,et al.  Performing DNA comparison on a bio-inspired tissue of FPGAs , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[17]  H.S. Lopes,et al.  A distributed approach for a multiple sequence alignment algorithm using a parallel virtual machine , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.