Hardware acceleration of approximate palindromes searching

Understanding the structure and function of DNA sequences represents an important area of research in modern biology. Unfortunately, analysis of such data is often complicated by the presence of mutations introduced by evolutionary processes. They increase the time-complexity of algorithms for sequence analysis by introducing an element of uncertainty, complicating their practical usage. One class of such algorithms has been designed to search for palindromes with possible errors-approximate palindromes. The best state-of-the-art methods implemented in software show time-complexity between linear and quadratic, depending on required input parameters. This paper investigates the possibilities for hardware acceleration of approximate palindrome searching and describes a parametrized architecture suitable for chips with FPGA technology. A prototype of the proposed architecture was implemented in VHDL language and synthesized for virtex technology. Application on test sequences shows that the circuit is able to speed up palindrome searching by up to 8000times in comparison with the best-known software method relying on suffix arrays.

[1]  J. Stoye,et al.  REPuter: the manifold applications of repeat analysis on a genomic scale. , 2001, Nucleic acids research.

[2]  Stephen Alstrup,et al.  Nearest common ancestors: a survey and a new distributed algorithm , 2002, SPAA.

[3]  H. Wolinsky The thousand‐dollar genome , 2007, EMBO reports.

[4]  Robert E. Tarjan,et al.  Scaling and related techniques for geometry problems , 1984, STOC '84.

[5]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[6]  Martin C. Herbordt,et al.  Processing Repetitive Sequence Structures with Mismatches at Streaming Rate , 2004, FPL.

[7]  Meena Kishore Sakharkar,et al.  TRES: comparative promoter sequence analysis , 2000, Bioinform..

[8]  Dan Gusfield Algorithms on Stings, Trees, and Sequences: Computer Science and Computational Biology , 1997, SIGACT News.

[9]  Philip Heng Wai Leong,et al.  A Smith-Waterman Systolic Cell , 2003, FPL.

[10]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prex Computation in Sux Arrays and Its Applications , 2001 .

[11]  L. Allison Finding Approximate Palindromes in Strings Quickly and Simply , 2004, ArXiv.

[12]  Valmir Carneiro Barbosa,et al.  Finding approximate palindromes in strings , 2002, Pattern Recognit..

[13]  H. Jia,et al.  The human genome-wide distribution of DNA palindromes , 2007, Functional & Integrative Genomics.

[14]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.

[15]  Mauricio Ayala-Rincón,et al.  A Modification of the Landau-Vishkin Algorithm Computing Longest Common Extensions via Suffix Arrays , 2005, BSB.

[16]  Martin C. Herbordt,et al.  Families of FPGA-based algorithms for approximate string matching , 2004 .