RNA design rules from a massive open laboratory

Significance Self-assembling RNA molecules play critical roles throughout biology and bioengineering. To accelerate progress in RNA design, we present EteRNA, the first internet-scale citizen science “game” scored by high-throughput experiments. A community of 37,000 nonexperts leveraged continuous remote laboratory feedback to learn new design rules that substantially improve the experimental accuracy of RNA structure designs. These rules, distilled by machine learning into a new automated algorithm EteRNABot, also significantly outperform prior algorithms in a gauntlet of independent tests. These results show that an online community can carry out large-scale experiments, hypothesis generation, and algorithm design to create practical advances in empirical science. Self-assembling RNA molecules present compelling substrates for the rational interrogation and control of living systems. However, imperfect in silico models—even at the secondary structure level—hinder the design of new RNAs that function properly when synthesized. Here, we present a unique and potentially general approach to such empirical problems: the Massive Open Laboratory. The EteRNA project connects 37,000 enthusiasts to RNA design puzzles through an online interface. Uniquely, EteRNA participants not only manipulate simulated molecules but also control a remote experimental pipeline for high-throughput RNA synthesis and structure mapping. We show herein that the EteRNA community leveraged dozens of cycles of continuous wet laboratory feedback to learn strategies for solving in vitro RNA design problems on which automated methods fail. The top strategies—including several previously unrecognized negative design rules—were distilled by machine learning into an algorithm, EteRNABot. Over a rigorous 1-y testing phase, both the EteRNA community and EteRNABot significantly outperformed prior algorithms in a dozen RNA secondary structure design tests, including the creation of dendrimer-like structures and scaffolds for small molecule sensors. These results show that an online community can carry out large-scale experiments, hypothesis generation, and algorithm design to create practical advances in empirical science.

[1]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[2]  N. Seeman Nucleic acid junctions and lattices. , 1982, Journal of theoretical biology.

[3]  Robert E. Bruccoleri,et al.  An improved algorithm for nucleic acid secondary structure display , 1988, Comput. Appl. Biosci..

[4]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[5]  M. Zuker,et al.  Structural analysis by energy dot plot of a large mRNA. , 1993, Journal of molecular biology.

[6]  P. Burgstaller,et al.  Isolation of RNA Aptamers for Biological Cofactors by In Vitro Selection , 1994 .

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[9]  F. Ducongé,et al.  Is a Closing “GA Pair” a Rule for Stable Loop-Loop RNA Complexes?* , 2000, The Journal of Biological Chemistry.

[10]  R. Breaker,et al.  Cooperative binding of effectors by an allosteric ribozyme. , 2001, Nucleic acids research.

[11]  Peter Norvig,et al.  Can Distributed Volunteers Accomplish Massive Data Analysis Tasks , 2001 .

[12]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[13]  Anne Condon,et al.  A new algorithm for RNA secondary structure design. , 2004, Journal of molecular biology.

[14]  J. Mattick RNA regulation: a new genetics? , 2004, Nature Reviews Genetics.

[15]  I. Hofacker RNA Secondary Structure Analysis Using the Vienna RNA Package , 2003, Current protocols in bioinformatics.

[16]  D. Turner,et al.  Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Robert M. Dirks,et al.  Paradigms for computational nucleic acid design. , 2004, Nucleic acids research.

[18]  S. Miller,et al.  DNA barcoding a useful tool for taxonomists , 2005, Nature.

[19]  K. Weeks,et al.  RNA structure analysis at single nucleotide resolution by selective 2'-hydroxyl acylation and primer extension (SHAPE). , 2005, Journal of the American Chemical Society.

[20]  J. Pickard,et al.  Guidelines reduce the risk of brain-scan shock , 2005, Nature.

[21]  D. Turner,et al.  RNA challenges for computational chemists. , 2005, Biochemistry.

[22]  Rolf Backofen,et al.  INFO-RNA - a fast approach to inverse RNA folding , 2006, Bioinform..

[23]  D. Herschlag,et al.  The paradoxical behavior of a highly structured misfolded intermediate in RNA folding. , 2006, Journal of molecular biology.

[24]  L. Jaeger,et al.  The architectonics of programmable RNA and DNA nanostructures. , 2006, Current opinion in structural biology.

[25]  R. Russell,et al.  DMS footprinting of structured RNAs and RNA–protein complexes , 2007, Nature Protocols.

[26]  C. Lintott,et al.  Galaxy Zoo: the large-scale spin statistics of spiral galaxies in the Sloan Digital Sky Survey , 2008, 0803.3247.

[27]  M. Win,et al.  Higher-Order Cellular Information Processing with Synthetic RNA Devices , 2008, Science.

[28]  Magdalena A. Jonikas,et al.  Structural inference of native and partially folded RNA by high-throughput contact mapping , 2008, Proceedings of the National Academy of Sciences.

[29]  D. Mathews,et al.  Accurate SHAPE-directed RNA structure determination , 2009, Proceedings of the National Academy of Sciences.

[30]  David H. Mathews,et al.  RNAstructure: software for RNA secondary structure prediction and analysis , 2010, BMC Bioinformatics.

[31]  T. Schlick,et al.  Tertiary motifs revealed in analyses of higher-order RNA junctions. , 2009, Journal of molecular biology.

[32]  A. Laederach,et al.  Evaluation of the information content of RNA structure mapping data for secondary structure prediction. , 2010, RNA.

[33]  P. Sharp,et al.  Nanoparticle-mediated delivery of siRNA targeting Parp1 extends survival of mice bearing tumors derived from Brca1-deficient ovarian cancer cells , 2010, Proceedings of the National Academy of Sciences.

[34]  D. Baker,et al.  Atomic accuracy in predicting and designing non-canonical RNA structure , 2010, Nature Methods.

[35]  Rhiju Das,et al.  Understanding the errors of SHAPE-directed RNA structure modeling. , 2011, Biochemistry.

[36]  Faisal A. Aldaye,et al.  Organization of Intracellular Reactions with Rationally Designed RNA Assemblies , 2011, Science.

[37]  Cole Trapnell,et al.  Multiplexed RNA structure characterization with selective 2′-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq) , 2011, Proceedings of the National Academy of Sciences.

[38]  I. Andricioaei,et al.  Discovery of selective bioactive small molecules by targeting an RNA dynamic ensemble. , 2011, Nature chemical biology.

[39]  Rhiju Das,et al.  A two-dimensional mutate-and-map strategy for non-coding RNA structure. , 2011, Nature chemistry.

[40]  Niles A. Pierce,et al.  Nucleic acid sequence design via efficient ensemble defect optimization , 2011, J. Comput. Chem..

[41]  David Baker,et al.  Algorithm discovery by protein folding game players , 2011, Proceedings of the National Academy of Sciences.

[42]  Conrad Steenberg,et al.  NUPACK: Analysis and design of nucleic acid systems , 2011, J. Comput. Chem..

[43]  Seunghyun Park,et al.  HiTRACE: high-throughput robust analysis for capillary electrophoresis , 2011, Bioinform..

[44]  D. Herschlag,et al.  Metal-ion rescue revisited: biochemical detection of site-bound metal ions important for RNA folding. , 2012, RNA.

[45]  Z. Popovic,et al.  Increased Diels-Alderase activity through backbone remodeling guided by Foldit players , 2012, Nature Biotechnology.

[46]  Rhiju Das,et al.  Quantitative dimethyl sulfate mapping for automated RNA secondary structure inference. , 2012, Biochemistry.

[47]  Manolis Kellis,et al.  RNA folding with soft constraints: reconciliation of probing data and thermodynamic secondary structure prediction , 2012, Nucleic acids research.

[48]  J. Bida,et al.  Squaring theory with practice in RNA design. , 2012, Current opinion in structural biology.

[49]  M. Helmstaedter Cellular-resolution connectomics: challenges of dense neural circuit reconstruction , 2013, Nature Methods.

[50]  Rhiju Das,et al.  The mutate-and-map protocol for inferring base pairs in structured RNA. , 2013, Methods in molecular biology.

[51]  Rhiju Das,et al.  Massively parallel RNA chemical mapping with a reduced bias MAP-seq protocol. , 2013, Methods in molecular biology.

[52]  Joy Sinha,et al.  Retraction: Reprogramming bacteria to seek and destroy an herbicide. , 2014, Nature chemical biology.