Efficient Parameterized Algorithm for Biopolymer Structure-Sequence Alignment

Computational alignment of a biopolymer sequence (e.g., an RNA or a protein) to a structure is an effective approach to predict and search for the structure of new sequences. To identify the structure of remote homologs, the structure-sequence alignment has to consider not only sequence similarity, but also spatially conserved conformations caused by residue interactions and, consequently, is computationally intractable. It is difficult to cope with the inefficiency without compromising alignment accuracy, especially for structure search in genomes or large databases. This paper introduces a novel method and a parameterized algorithm for structure-sequence alignment. Both the structure and the sequence are represented as graphs, where, in general, the graph for a biopolymer structure has a naturally small tree width. The algorithm constructs an optimal alignment by finding in the sequence graph the maximum valued subgraph isomorphic to the structure graph. It has the computational time complexity O(k3N2) for the structure of N residues and its tree decomposition of width t. Parameter k, small in nature, is determined by a statistical cutoff for the correspondence between the structure and the sequence. This paper demonstrates a successful application of the algorithm to RNA structure search used for noncoding RNA identification. An application to protein threading is also discussed

[1]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[2]  Elena Rivas,et al.  Noncoding RNA gene detection using comparative sequence analysis , 2001, BMC Bioinformatics.

[3]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[4]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[5]  S. Bryant,et al.  Statistics of sequence-structure threading. , 1995, Current opinion in structural biology.

[6]  Liming Cai,et al.  Protein Structure Prediction by Protein Threading , 2010, Computational Methods for Protein Structure Prediction and Modeling.

[7]  Russell L. Malmberg,et al.  Stochastic modeling of RNA pseudoknotted structures: a grammatical approach , 2003, ISMB.

[8]  David Eppstein,et al.  The Polyhedral Approach to the Maximum Planar Subgraph Problem: New Chances for Related Problems , 1994, GD.

[9]  Detlef Seese,et al.  Easy Problems for Tree-Decomposable Graphs , 1991, J. Algorithms.

[10]  M Vihinen,et al.  Modelling the structure of the calcitonin gene-related peptide. , 1994, Protein engineering.

[11]  Edward C. Uberbacher,et al.  Sequence-structure specificity of a knowledge based energy function at the secondary structure level , 2000, Bioinform..

[12]  B. Snel,et al.  Pathway alignment: application to the comparative analysis of glycolytic enzymes. , 1999, The Biochemical journal.

[13]  Russell L. Malmberg,et al.  Tree decomposition based fast search of RNA structures including pseudoknots in genomes , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[14]  J. Doudna Structural genomics of RNA , 2000, Nature Structural Biology.

[15]  Hans L. Bodlaender A linear time algorithm for finding tree-decompositions of small treewidth , 1993, STOC '93.

[16]  R. Lathrop The protein threading problem with sequence amino acid interaction preferences is NP-complete. , 1994, Protein engineering.

[17]  Christian N. S. Pedersen,et al.  RNA Pseudoknot Prediction in Energy-Based Models , 2000, J. Comput. Biol..

[18]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[19]  Jack Snoeyink,et al.  An Adaptive Dynamic Programming Algorithm for the Side Chain Placement Problem , 2004, Pacific Symposium on Biocomputing.

[20]  Satoshi Kobayashi,et al.  Tree Adjoining Grammars for RNA Structure Prediction , 1999, Theor. Comput. Sci..

[21]  S. Goebel,et al.  Characterization of the RNA Components of a Putative Molecular Switch in the 3′ Untranslated Region of the Murine Coronavirus Genome , 2004, Journal of Virology.

[22]  Ying Xu,et al.  An Efficient Computational Method for Globally Optimal Threading , 1998, J. Comput. Biol..

[23]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[24]  Kurt Mehlhorn,et al.  A branch-and-cut algorithm for multiple sequence alignment , 1997, RECOMB '97.

[25]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[26]  S. Eddy Computational Genomics of Noncoding RNA Genes , 2002, Cell.

[27]  Stefan Arnborg,et al.  Linear time algorithms for NP-hard problems restricted to partial k-trees , 1989, Discret. Appl. Math..

[28]  D. Pervouchine IRIS: intermolecular RNA interaction search. , 2004, Genome informatics. International Conference on Genome Informatics.

[29]  Paul D. Seymour,et al.  Graph Minors. II. Algorithmic Aspects of Tree-Width , 1986, J. Algorithms.

[30]  Russell L. Malmberg,et al.  Profiling and Searching for RNA Pseudoknot Structures in Genomes , 2005, International Conference on Computational Science.

[31]  M Brown,et al.  RNA pseudoknot modeling using intersections of stochastic context free grammars with applications to database search. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[32]  Sherif Abou Elela,et al.  A Phylogenetically Based Secondary Structure for the Yeast Telomerase RNA , 2004, Current Biology.

[33]  J. F. Atkins,et al.  Functional and structural analysis of a pseudoknot upstream of the tag-encoded sequence in E. coli tmRNA. , 1999, Journal of molecular biology.

[34]  Temple F. Smith,et al.  Analysis and algorithms for protein sequence–structure alignment , 1998 .

[35]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[36]  Jinbo Xu,et al.  Rapid Protein Side-Chain Packing via Tree Decomposition , 2005, RECOMB.

[37]  Jie Liang,et al.  Computational Methods for Protein Structure Prediction and Modeling , 2007 .

[38]  Martin Vingron,et al.  A polyhedral approach to RNA sequence structure alignment , 1998, RECOMB '98.

[39]  Bonnie Berger,et al.  A tree-decomposition approach to protein structure prediction , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[40]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[41]  Robin Thomas,et al.  On the complexity of finding iso- and other morphisms for partial k-trees , 1992, Discret. Math..

[42]  Sean R. Eddy,et al.  RSEARCH: Finding homologs of single structured RNA sequences , 2003, BMC Bioinformatics.

[43]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..