An Efficient Dual Sampling Algorithm with Hamming Distance Filtration

Recently, a framework considering ribonucleic acid (RNA) sequences and their RNA secondary structures as pairs has led to new information theoretic perspectives on how the semantics encoded in RNA sequences can be inferred. In this context, the pairing arises naturally from the energy model of RNA secondary structures. Fixing the sequence in the pairing produces the RNA energy landscape, whose partition function was discovered by McCaskill. Dually, fixing the structure induces the energy landscape of sequences. The latter has been considered for designing more efficient inverse folding algorithms. In this work, we present the dual partition function filtered by Hamming distance, together with a Boltzmann sampler using novel dynamic programming routines for the loop-based energy model. The time complexity of the algorithm is [Formula: see text], where [Formula: see text] are Hamming distance and sequence length, respectively, reducing the time complexity of samplers, reported in the literature by [Formula: see text]. We then present two applications, the first in the context of the evolution of natural sequence-structure pairs of microRNAs and the second in constructing neutral paths. The former studies the inverse folding rate (IFR) of sequence-structure pairs, filtered by Hamming distance, observing that such pairs evolve toward higher levels of robustness, that is, increasing IFR. The latter is an algorithm that constructs neutral paths: given two sequences in a neutral network, we employ the sampler to construct short paths connecting them, consisting of sequences all contained in the neutral network.

[1]  D. Bartel,et al.  One sequence, two ribozymes: implications for the emergence of new ribozyme folds. , 2000, Science.

[2]  Detlev Riesner,et al.  Viroid processing: switch from cleavage to ligation is driven by a change from a tetraloop to a loop E conformation , 1997, The EMBO journal.

[3]  Neutral Networks of Minimum Free Energy RNA Secondary Structures , 2000 .

[4]  R. Breaker,et al.  Adenine riboswitches and gene activation by disruption of a transcription terminator , 2004, Nature Structural &Molecular Biology.

[5]  Jerrold R. Griggs,et al.  Algorithms for Loop Matchings , 1978 .

[6]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[7]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[8]  Christian V. Forst,et al.  RNA Pathfinder – Global Properties of Neutral Networks , 2002 .

[9]  K. Dill,et al.  From Levinthal to pathways to funnels , 1997, Nature Structural Biology.

[10]  Daniel J. Kleitman,et al.  Proportions of Irreducible Diagrams , 1970 .

[11]  J. Onuchic,et al.  Theory of protein folding: the energy landscape perspective. , 1997, Annual review of physical chemistry.

[12]  M. Kimura Evolutionary Rate at the Molecular Level , 1968, Nature.

[13]  Christian M. Reidys,et al.  Sequence‐structure relations of biopolymers , 2015, Bioinform..

[14]  Y. Ponty Efficient sampling of RNA secondary structures from the Boltzmann ensemble of low-energy , 2007, Journal of mathematical biology.

[15]  G. F. Joyce,et al.  Inventing and improving ribozyme function: rational design versus iterative selection methods. , 1994, Trends in biotechnology.

[16]  Guillermo Rodrigo,et al.  Describing the structural robustness landscape of bacterial small RNAs , 2012, BMC Evolutionary Biology.

[17]  Peter F. Stadler,et al.  Partition function and base pairing probabilities of RNA heterodimers , 2006, Algorithms for Molecular Biology.

[18]  Michael T. Wolfinger,et al.  Efficient computation of RNA folding dynamics , 2004 .

[19]  Ronny Lorenz,et al.  2D Projections of RNA Folding Landscapes , 2009, GCB.

[20]  H. M. Martinez,et al.  An RNA folding rule , 1984, Nucleic Acids Res..

[21]  Peter Clote,et al.  RNAdualPF: software to compute the dual partition function with sample applications in molecular evolution theory , 2016, BMC Bioinformatics.

[22]  Srinivas Devadas,et al.  Efficient Algorithms for Probing the RNA Mutation Landscape , 2008, PLoS Comput. Biol..

[23]  K. Dill,et al.  RNA folding energy landscapes. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[24]  C. Lawrence,et al.  A statistical sampling algorithm for RNA secondary structure prediction. , 2003, Nucleic acids research.

[25]  P. Schuster,et al.  Analysis of RNA sequence structure maps by exhaustive enumeration II. Structures of neutral networks and shape space covering , 1996 .

[26]  B. Berger,et al.  A global sampling approach to designing and reengineering RNA secondary structures , 2012, Nucleic acids research.

[27]  Markus E. Nebel,et al.  Random generation of RNA secondary structures according to native distributions , 2011, Algorithms for Molecular Biology.

[28]  Ana Kozomara,et al.  miRBase: annotating high confidence microRNAs using deep sequencing data , 2013, Nucleic Acids Res..

[29]  David H. Mathews,et al.  NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure , 2009, Nucleic Acids Res..

[30]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[31]  Christian M. Reidys,et al.  Random Induced Subgraphs of Generalizedn-Cubes , 1997 .

[32]  M. Waterman Secondary Structure of Single-Stranded Nucleic Acidst , 1978 .

[33]  Rolf Backofen,et al.  INFO-RNA - a fast approach to inverse RNA folding , 2006, Bioinform..

[34]  P. R. Stein,et al.  On a class of linked diagrams II. asymptotics , 1978, Discret. Math..

[35]  E. Borenstein,et al.  Direct evolution of genetic robustness in microRNA. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[36]  J. Darnell RNA: Life's Indispensable Molecule , 2011 .

[37]  Nicholas Kalouptsidis,et al.  Efficient Algorithms for , 1999 .

[38]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[39]  P. Schuster,et al.  Generic properties of combinatory maps: neutral networks of RNA secondary structures. , 1997, Bulletin of mathematical biology.

[40]  Peter Clote,et al.  Boltzmann probability of RNA structural neighbors and riboswitch detection , 2007, Bioinform..

[41]  Christine E. Heitsch,et al.  Profiling small RNA reveals multimodal substructural signals in a Boltzmann ensemble , 2014, Nucleic acids research.

[42]  A. Serganov,et al.  Ribozymes, riboswitches and beyond: regulation of gene expression without proteins , 2007, Nature Reviews Genetics.

[43]  I. Tinoco,et al.  How RNA folds. , 1999, Journal of molecular biology.

[44]  R R Breaker,et al.  Are engineered proteins getting competition from RNA? , 1996, Current opinion in biotechnology.