Clustering Rfam 10.1: Clans, Families, and Classes

The Rfam database contains information about non-coding RNAs emphasizing their secondary structures and organizing them into families of homologous RNA genes or functional RNA elements. Recently, a higher order organization of Rfam in terms of the so-called clans was proposed along with its “decimal release”. In this proposition, some of the families have been assigned to clans based on experimental and computational data in order to find related families. In the present work we investigate an alternative classification for the RNA families based on tree edit distance. The resulting clustering recovers some of the Rfam clans. The majority of clans, however, are not recovered by the structural clustering. Instead, they get dispersed into larger clusters, which correspond roughly to Genes 2012, 3 379 well-described RNA classes such as snoRNAs, miRNAs, and CRISPRs. In conclusion, a structure-based clustering can contribute to the elucidation of the relationships among the Rfam families beyond the realm of clans and classes.

[1]  Eugene Berezikov,et al.  Evolution of microRNA diversity and regulation in animals , 2011, Nature Reviews Genetics.

[2]  Andrea Tanzer,et al.  Animal snoRNAs and scaRNAs with exceptional structures , 2011, RNA biology.

[3]  B. Cullen,et al.  Viruses and microRNAs: RISCy interactions with serious consequences. , 2011, Genes & development.

[4]  E. Lai,et al.  Vive la différence: biogenesis and evolution of microRNAs in plants and animals , 2011, Genome Biology.

[5]  Robert D. Finn,et al.  Rfam: Wikipedia, clans and the “decimal” release , 2010, Nucleic Acids Res..

[6]  Richard Cowper-Sal·lari,et al.  microRNAs reveal the interrelationships of hagfish, lampreys, and gnathostomes and the nature of the ancestral vertebrate , 2010, Proceedings of the National Academy of Sciences.

[7]  Albert J. Vilella,et al.  Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis , 2010, PLoS biology.

[8]  N. Larsen,et al.  Kinship in the SRP RNA family , 2009, RNA biology.

[9]  B. Sobral,et al.  Variations on the tmRNA gene , 2009, RNA biology.

[10]  E. Szathmáry,et al.  One ancestor for two codes viewed from the perspective of two complementary modes of tRNA aminoacylation , 2009, Biology Direct.

[11]  Paulo P. Amaral,et al.  MEN epsilon/beta nuclear-retained non-coding RNAs are up-regulated upon muscle differentiation and are essential components of paraspeckles. , 2009, Genome research.

[12]  David L. Spector,et al.  3′ End Processing of a Long Nuclear-Retained Noncoding RNA Yields a tRNA-like Cytoplasmic RNA , 2008, Cell.

[13]  Huiqing Liu,et al.  RNACluster: An integrated tool for RNA secondary structure comparison and clustering , 2008, J. Comput. Chem..

[14]  Wen-chang Lin,et al.  Vir-Mir db: prediction of viral microRNA candidate hairpins , 2007, Nucleic Acids Res..

[15]  F. Slack,et al.  The evolution of animal microRNA function. , 2007, Current opinion in genetics & development.

[16]  Jan Gorodkin,et al.  Multiple structural alignment and clustering of RNA sequences , 2007, Bioinform..

[17]  Rolf Backofen,et al.  Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering , 2007, PLoS Comput. Biol..

[18]  P. Stadler,et al.  RNase MRP and the RNA processing cascade in the eukaryotic ancestor , 2007, BMC Evolutionary Biology.

[19]  N. Rajewsky,et al.  The evolution of gene regulation by transcription factors and microRNAs , 2007, Nature Reviews Genetics.

[20]  Mark A McPeek,et al.  The phylogenetic distribution of metazoan microRNAs: insights into evolutionary complexity and constraint. , 2006, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[21]  D. Engelke,et al.  Ribonuclease P: The Evolution of an Ancient RNA Enzyme , 2006, Critical reviews in biochemistry and molecular biology.

[22]  Ivo L Hofacker,et al.  RNAs everywhere: genome-wide annotation of structured RNAs. , 2006, Genome informatics. International Conference on Genome Informatics.

[23]  Christoph Flamm,et al.  The expansion of the metazoan microRNA repertoire , 2006, BMC Genomics.

[24]  M. Mitreva,et al.  Alpha-gliadin genes from the A, B, and D genomes of wheat contain different sets of celiac disease epitopes , 2006, BMC Genomics.

[25]  Tore Samuelsson,et al.  Identification and analysis of ribonuclease P and MRP RNA in a broad range of eukaryotes , 2005, Nucleic acids research.

[26]  C. Hellen,et al.  Translation initiation by factor-independent binding of eukaryotic ribosomes to internal ribosomal entry sites. , 2005, Comptes rendus biologies.

[27]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[28]  Peter F Stadler,et al.  Molecular evolution of a microRNA cluster. , 2004, Journal of molecular biology.

[29]  S. Sharkady,et al.  A third lineage with two-piece tmRNA. , 2004, Nucleic acids research.

[30]  Robert Giegerich,et al.  Local similarity in RNA secondary structures , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[31]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[32]  P. Stadler,et al.  Secondary structure prediction for aligned RNA sequences. , 2002, Journal of molecular biology.

[33]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[34]  A. Hüttenhofer,et al.  Neuronal BC1 RNA structure: evolutionary conversion of a tRNA(Ala) domain into an extended stem-loop structure. , 2001, RNA.

[35]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[36]  D. Dairaghi,et al.  Secondary structure of RNase MRP RNA as predicted by phylogenetic comparison , 1993, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[37]  Kaizhong Zhang,et al.  Comparing multiple RNA secondary structures using tree comparisons , 1990, Comput. Appl. Biosci..

[38]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[39]  A Dress,et al.  How old is the genetic code? Statistical geometry of tRNA provides an answer. , 1989, Science.

[40]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[41]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[42]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[43]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[44]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .