Cache-Oblivious Dynamic Programming for Bioinformatics

We present efficient cache-oblivious algorithms for some well-studied string problems in bioinformatics including the longest common subsequence, global pairwise sequence alignment and three-way sequence alignment (or median), both with affine gap costs, and RNA secondary structure prediction with simple pseudoknots. For each of these problems, we present cache-oblivious algorithms that match the best-known time complexity, match or improve the best-known space complexity, and improve significantly over the cache-efficiency of earlier algorithms. We present experimental results which show that our cache-oblivious algorithms run faster than software and implementations based on previous best algorithms for these problems.

[1]  J. Mixter Fast , 2012 .

[2]  V. Ramachandran,et al.  The Cache-Oblivious Gaussian Elimination Paradigm: Theoretical Framework, Parallelization and Experimental Evaluation , 2010, Theory of Computing Systems.

[3]  Vijaya Ramachandran,et al.  Oblivious algorithms for multicores and network of processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[4]  Vijaya Ramachandran,et al.  Cache-efficient dynamic programming algorithms for multicores , 2008, SPAA '08.

[5]  Josef Weidendorfer,et al.  Valgrind 3.3 - Advanced Debugging and Profiling for Gnu/Linux Applications , 2008 .

[6]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[7]  Rezaul Alam Chowdhury,et al.  Algorithms and data structures for cache-efficient computation: theory and experimental evaluation , 2007 .

[8]  Hai-Son Le,et al.  Efficient Cache-oblivious String Algorithms for Bioinformatics , 2007 .

[9]  Vijaya Ramachandran,et al.  The cache-oblivious gaussian elimination paradigm: theoretical framework and experimental evaluation , 2006, SPAA '06.

[10]  Vijaya Ramachandran,et al.  Cache-oblivious dynamic programming , 2006, SODA '06.

[11]  Hai-Son Le Algorithms for Identification of Patterns in Biogeography and Median Alignment of Three Sequences in Bioinformatics , 2006 .

[12]  Craig A. Stewart,et al.  Introduction to computational biology , 2005 .

[13]  Volker Strumpen,et al.  Cache oblivious stencil computations , 2005, ICS '05.

[14]  Éva Tardos,et al.  Algorithm design , 2005 .

[15]  C. Pandu Rangan,et al.  A linear space algorithm for the LCS problem , 2004, Acta Informatica.

[16]  Bjarne Knudsen,et al.  Optimal Multiple Parsimony Alignment with Affine Gap Cost Using a Phylogenetic Tree , 2003, WABI.

[17]  Nancy F. Hansen,et al.  Comparative analyses of multi-species sequences from targeted genomic regions , 2003, Nature.

[18]  T. Z. DeSantis,et al.  Comprehensive aligned sequence construction for automated design of effective probes (CASCADE-P) using 16S rDNA , 2003, Bioinform..

[19]  Gad M. Landau,et al.  A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices , 2003, SIAM J. Comput..

[20]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[21]  Nan Yu,et al.  The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs , 2002, BMC Bioinformatics.

[22]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[23]  L. Allison,et al.  Fast, optimal alignment of three sequences using linear gap costs. , 2000, Journal of theoretical biology.

[24]  L. Bergroth,et al.  A survey of longest common subsequence algorithms , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[25]  Tatsuya Akutsu,et al.  Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots , 2000, Discret. Appl. Math..

[26]  Christian N. S. Pedersen,et al.  RNA Pseudoknot Prediction in Energy-Based Models , 2000, J. Comput. Biol..

[27]  Matteo Frigo,et al.  Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[28]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.

[29]  Elen aRiva san A Dynamic Programming Algorithm for RNA Structure Prediction Including Pseudoknots , 1999 .

[30]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[31]  Richard Hughey,et al.  Reduced space sequence alignment , 1997, Comput. Appl. Biosci..

[32]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[33]  Alberto Apostolico,et al.  Fast Linear-Space Computations of Longest Common Subsequences , 1992, Theor. Comput. Sci..

[34]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[35]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[37]  Bruce W. Erickson,et al.  Optimal sequence alignment using affine gap costs , 1986 .

[38]  S. Altschul,et al.  Optimal sequence alignment using affine gap costs. , 1986, Bulletin of mathematical biology.

[39]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[40]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.

[41]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[42]  David Maier,et al.  The Complexity of Some Problems on Subsequences and Supersequences , 1978, JACM.

[43]  Daniel S. Hirschberg,et al.  An Information-Theoretic Lower Bound for the Longest Common Subsequence Problem , 1977, Inf. Process. Lett..

[44]  Alfred V. Aho,et al.  Bounds on the Complexity of the Longest Common Subsequence Problem , 1976, J. ACM.

[45]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.