Sequence‐structure relations of biopolymers

Motivation: DNA data is transcribed into single‐stranded RNA, which folds into specific molecular structures. In this paper we pose the question to what extent sequence‐ and structure‐information correlate. We view this correlation as structural semantics of sequence data that allows for a different interpretation than conventional sequence alignment. Structural semantics could enable us to identify more general embedded ‘patterns’ in DNA and RNA sequences. Results: We compute the partition function of sequences with respect to a fixed structure and connect this computation to the mutual information of a sequence‐structure pair for RNA secondary structures. We present a Boltzmann sampler and obtain the a priori probability of specific sequence patterns. We present a detailed analysis for the three PDB‐structures, 2JXV (hairpin), 2N3R (3‐branch multi‐loop) and 1EHZ (tRNA). We localize specific sequence patterns, contrast the energy spectrum of the Boltzmann sampled sequences versus those sequences that refold into the same structure and derive a criterion to identify native structures. We illustrate that there are multiple sequences in the partition function of a fixed structure, each having nearly the same mutual information, that are nevertheless poorly aligned. This indicates the possibility of the existence of relevant patterns embedded in the sequences that are not discoverable using alignments. Availability and Implementation: The source code is freely available at http://staff.vbi.vt.edu/fenixh/Sampler.zip Contact: duckcr@vbi.vt.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Michael S. Waterman,et al.  RNA Secondary Structure , 1995 .

[2]  E. Koonin,et al.  Tentative identification of RNA‐dependent RNA polymerases of dsRNA viruses and their relationship to positive strand RNA viral polymerases , 1989, FEBS letters.

[3]  R. C. Penner Cell decomposition and compactification of Riemann's moduli space in decorated Teichm\"uller theory , 2003 .

[4]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[5]  William S. Massey,et al.  Algebraic Topology: An Introduction , 1977 .

[6]  Irmtraud M. Meyer,et al.  Moments of the Boltzmann distribution for RNA secondary structures , 2005, Bulletin of mathematical biology.

[7]  Allan Fitzsimmons National Fire Plan fuels treatments target the wildland–urban interface in the western United States , 2009, Proceedings of the National Academy of Sciences.

[8]  M. Waterman Secondary Structure of Single-Stranded Nucleic Acidst , 1978 .

[9]  Peter F. Stadler,et al.  Prediction of RNA Base Pairing Probabilities on Massively Parallel Computers , 2000, J. Comput. Biol..

[10]  Rolf Backofen,et al.  INFO-RNA - a fast approach to inverse RNA folding , 2006, Bioinform..

[11]  D. Turner,et al.  Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Sébastien Lemieux,et al.  The NMR structure of the II–III–VI three-way junction from the Neurospora VS ribozyme reveals a critical tertiary interaction and provides new insights into the global ribozyme structure , 2015, RNA.

[13]  P. Schuster,et al.  Algorithm independent properties of RNA secondary structure predictions , 1996, European Biophysics Journal.

[14]  Serafim Batzoglou,et al.  CONTRAfold: RNA secondary structure prediction without physics-based models , 2006, ISMB.

[15]  Michael S. Waterman,et al.  Spaces of RNA Secondary Structures , 1993 .

[16]  Jerrold R. Griggs,et al.  Algorithms for Loop Matchings , 1978 .

[17]  B. Berger,et al.  A global sampling approach to designing and reengineering RNA secondary structures , 2012, Nucleic acids research.

[18]  J. Holland,et al.  Denatured DNA as a direct template for in vitro protein synthesis. , 1965, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Janez Plavec,et al.  Solution structure of a let-7 miRNA:lin-41 mRNA complex from C. elegans , 2008, Nucleic acids research.

[20]  A. Zee,et al.  Topological classification of RNA structures. , 2006, Journal of molecular biology.

[21]  S. Mizutani,et al.  RNA-dependent DNA polymerase in virions of Rous sarcoma virus. , 1970, Nature.

[22]  C. Lawrence,et al.  A statistical sampling algorithm for RNA secondary structure prediction. , 2003, Nucleic acids research.

[23]  Robert C. Penner,et al.  Perturbative series and the moduli space of Riemann surfaces , 1988 .

[24]  P. Schuster,et al.  From sequences to shapes and back: a case study in RNA secondary structures , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[25]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[26]  Yann Ponty,et al.  A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution , 2013, Bioinform..

[27]  R. C. Penner,et al.  Topological classification and enumeration of RNA structures by genus , 2013, Journal of mathematical biology.

[28]  Yongchao Liu,et al.  Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data , 2013, Bioinform..

[29]  P. Schuster,et al.  Genotypes with phenotypes: adventures in an RNA toy world. , 1997, Biophysical chemistry.

[30]  Christian M. Reidys,et al.  Topological language for RNA , 2016, Mathematical biosciences.

[31]  D. Mathews Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. , 2004, RNA.

[32]  David H. Mathews,et al.  NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure , 2009, Nucleic Acids Res..

[33]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[34]  Christian M. Reidys,et al.  Topology and prediction of RNA pseudoknots , 2011, Bioinform..

[35]  Martin Loebl,et al.  The chromatic polynomial of fatgraphs and its categorification , 2008 .

[36]  G. Helt,et al.  Transcriptional Maps of 10 Human Chromosomes at 5-Nucleotide Resolution , 2005, Science.

[37]  J. WISHART Statistical Sampling , 1950, Nature.

[38]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[39]  F. Major,et al.  The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data , 2008, Nature.

[40]  R. Ho Algebraic Topology , 2022 .

[41]  Michael S. Waterman,et al.  Linear Trees and RNA Secondary Structure , 1994, Discret. Appl. Math..

[42]  D. Mathews,et al.  Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots , 2013, Proceedings of the National Academy of Sciences.

[43]  P. Moore,et al.  The crystal structure of yeast phenylalanine tRNA at 1.93 A resolution: a classic structure revisited. , 2000, RNA.

[44]  D. Mathews,et al.  Accurate SHAPE-directed RNA structure determination , 2009, Proceedings of the National Academy of Sciences.

[45]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[46]  Peter F. Stadler,et al.  SHAPE directed RNA folding , 2015, bioRxiv.

[47]  Rex A. Dwyer,et al.  RNA Secondary Structure , 2002 .

[48]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[49]  Akihiko Yamagishi,et al.  Polypeptide synthesis directed by DNA as a messenger in cell-free polypeptide synthesis by extreme thermophiles, Thermus thermophilus HB27 and Sulfolobus tokodaii strain 7. , 2002, Journal of biochemistry.

[50]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[51]  J. Neumann,et al.  Numerical inverting of matrices of high order , 1947 .

[52]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.

[53]  A. Zee,et al.  RNA folding and large N matrix theory , 2001, cond-mat/0106359.

[54]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[55]  Carsten Wiuf,et al.  Fatgraph models of proteins , 2009, 0902.1025.

[56]  Elena Rivas,et al.  The language of RNA: a formal grammar that includes pseudoknots , 2000, Bioinform..