Discovering non-coding RNA elements in drosophila 3′ untranslated regions

The non-coding RNA (ncRNA) elements in the 3' untranslated regions (3'-UTRs) are known to participate in the genes' post-transcriptional regulation, such as their stability, translation efficiency, and subcellular localization. Inferring co-expression patterns of the genes by clustering their 3'-UTR ncRNA elements will provide invaluable knowledge for further studies of their functionalities and interactions under specific physiological processes. In this work, we propose an improved RNA structural clustering pipeline that takes into account the length-dependent distribution of the structural similarity measure. Benchmark of the proposed pipeline on Rfam data clearly demonstrates over 10% performance gain, compared to a traditional hierarchical clustering pipeline. By applying the proposed clustering pipeline to Drosophila melanogaster's 3'-UTRs, we have successfully identified 184 ncRNA clusters, of which 91.3% appear to be true RNA structural elements, based on RNAz's prediction. Among the clusters we have rediscovered the well-known histone ncRNA family as well as a number of other families whose potential functionalities may be inferred from existing studies. One of such families contains genes that are preferentially expressed in male Drosophila. In situ hybridization further reveals their characteristic `cup' or `comet' localization patterns in Drosophila testis. The complete clustering results are available at http://genome.ucf.edu/fly3UTRcluster.

[1]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[2]  J. Dow,et al.  Using FlyAtlas to identify better Drosophila melanogaster models of human disease , 2007, Nature Genetics.

[3]  Zohar Yakhini,et al.  Clustering gene expression patterns , 1999, J. Comput. Biol..

[4]  Elin Gudmannsdottir,et al.  Post-meiotic transcription in Drosophila testes , 2008, Development.

[5]  K. Martin,et al.  mRNA Localization: Gene Expression in the Spatial Dimension , 2009, Cell.

[6]  Sean R. Eddy,et al.  A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure , 2002, BMC Bioinformatics.

[7]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[8]  A. Fasano,et al.  Regulation of Intercellular Tight Junctions by Zonula Occludens Toxin and Its Eukaryotic Analogue Zonulin , 2000, Annals of the New York Academy of Sciences.

[9]  Florence Besse,et al.  Translational control of localized mRNAs: restricting protein synthesis in space and time , 2008, Nature Reviews Molecular Cell Biology.

[10]  Peter N. Robinson,et al.  GOing Bayesian: model-based gene set analysis of genome-scale data , 2010, Nucleic acids research.

[11]  Kristin Reiche,et al.  Structural profiles of human miRNA families from pairwise clustering , 2009, Bioinform..

[12]  David M. Shotton,et al.  FlyTED: the Drosophila Testis Gene Expression Database , 2009, Nucleic Acids Res..

[13]  Peter F. Stadler,et al.  RNAz 2.0: Improved Noncoding RNA Detection , 2010, Pacific Symposium on Biocomputing.

[14]  Hedi Peterson,et al.  g:Profiler—a web-based toolset for functional profiling of gene lists from large-scale experiments , 2007, Nucleic Acids Res..

[15]  Kendal Broadie,et al.  Gliotactin, a novel transmembrane protein on peripheral glia, is required to form the blood-nerve barrier in drosophila , 1995, Cell.

[16]  R. Drysdale FlyBase : a database for the Drosophila research community. , 2008, Methods in molecular biology.

[17]  Diego di Bernardo,et al.  ddbRNA: detection of conserved secondary structures in multiple alignments , 2003, Bioinform..

[18]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[19]  T. Südhof Neuroligins and neurexins link synaptic function to cognitive disease , 2008, Nature.

[20]  R. Jansen,et al.  mRNA localization: message on the move , 2001, Nature Reviews Molecular Cell Biology.

[21]  Trudy F C Mackay,et al.  Phenotypic Plasticity and Genotype by Environment Interaction for Olfactory Behavior in Drosophila melanogaster , 2008, Genetics.

[22]  H. Jäckle,et al.  Genetic analysis of the larval optic nerve projection in Drosophila. , 1997, Development.

[23]  Z. Dominski,et al.  Formation of the 3' end of histone mRNA. , 1999, Gene.

[24]  Ron Shamir,et al.  A Faster Algorithm for Simultaneous Alignment and Folding of RNA , 2010, J. Comput. Biol..

[25]  J A Firth,et al.  Endothelial barriers: from hypothetical pores to membrane proteins * , 2002, Journal of anatomy.

[26]  Huei-Hun Tseng,et al.  Finding Non-coding RNAs Through Genome-Scale Clustering , 2008, APBC.

[27]  Rolf Backofen,et al.  Sparse RNA folding: Time and space efficient algorithms , 2009, J. Discrete Algorithms.

[28]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[29]  Peter J. Bickel,et al.  The Developmental Transcriptome of Drosophila melanogaster , 2010, Nature.

[30]  H. Atwood,et al.  Enhancement of presynaptic performance in transgenic Drosophila overexpressing heat shock protein Hsp70 , 2002, Synapse.

[31]  Michal Ziv-Ukelson,et al.  A Study of Accessible Motifs and RNA Folding Complexity , 2007, J. Comput. Biol..

[32]  S. Altschul,et al.  Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage. , 1985, Molecular biology and evolution.

[33]  Su-Shing Chen,et al.  Statistical distributions of optimal global alignment scores of random protein sequences , 2005, BMC Bioinformatics.

[34]  B. Berger,et al.  MSARI: multiple sequence alignments for statistical detection of RNA secondary structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[35]  E. R. Gavis,et al.  Overlapping but distinct RNA elements control repression and activation of nanos translation. , 2000, Molecular cell.

[36]  Rolf Backofen,et al.  Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering , 2007, PLoS Comput. Biol..

[37]  Elena Rivas,et al.  Noncoding RNA gene detection using comparative sequence analysis , 2001, BMC Bioinformatics.

[38]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[39]  Peter F. Stadler,et al.  Alignment of RNA base pairing probability matrices , 2004, Bioinform..

[40]  David Haussler,et al.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome , 2006, PLoS Comput. Biol..

[41]  R. Fehon,et al.  Neuroglian, Gliotactin, and the Na+/K+ ATPase are essential for septate junction function in Drosophila , 2003, The Journal of cell biology.

[42]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[43]  Jan Gorodkin,et al.  Multiple structural alignment and clustering of RNA sequences , 2007, Bioinform..

[44]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Eran Segal,et al.  Computational prediction of RNA structural motifs involved in posttranscriptional regulatory processes , 2008, Proceedings of the National Academy of Sciences.

[46]  Vasudevan Seshadri,et al.  Translational control by the 3'-UTR: the ends specify the means. , 2003, Trends in biochemical sciences.

[47]  P. Tomançak,et al.  Global Analysis of mRNA Localization Reveals a Prominent Role in Organizing Cellular Architecture and Function , 2007, Cell.

[48]  Shaojie Zhang,et al.  Discovering non-coding RNA elements in drosophila 3′ untranslated regions , 2012, ICCABS.

[49]  Evgeny M. Zdobnov,et al.  OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011 , 2010, Nucleic Acids Res..

[50]  Julie M. Sullivan,et al.  FlyMine: an integrated database for Drosophila and Anopheles genomics , 2007, Genome Biology.

[51]  William Ritchie,et al.  RNA stem-loops: to be or not to be cleaved by RNAse III. , 2007, RNA.

[52]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .