Regmex, Motif analysis in ranked lists of sequences

Motif analysis has long been an important method to characterize biological functionality and the current growth of sequencing-based genomics experiments further extends its potential. These diverse experiments often generate sequence lists ranked by some functional property. There is therefore a growing need for motif analysis methods that can exploit this coupled data structure and be tailored for specific biological questions. Here, we present a motif analysis tool, Regmex (REGular expression Motif EXplorer), which offers several methods to identify overrepresented motifs in a ranked list of sequences. Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact probabilities for motif observations in sequences. Motif enrichment is optionally evaluated using random walks, Brownian bridges, or modified rank based statistics. These features make Regmex well suited for a range of biological sequence analysis problems related to motif discovery. We demonstrate different usage scenarios including rank correlation of microRNA binding sites co-occurring with a U-rich motif. The method is available as an R package.

[1]  Emanuele Raineri,et al.  Faster exact Markovian probability functions for motif occurrences: a DFA-only approach , 2008, Bioinform..

[2]  Anders Krogh,et al.  cWords - systematic microRNA regulatory motif discovery from mRNA expression data , 2013, Silence.

[3]  Anders Krogh,et al.  Signatures of RNA binding proteins globally coupled to effective microRNA target sites. , 2010, Genome research.

[4]  Yael Mandel-Gutfreund,et al.  DRIMust: a web server for discovering rank imbalanced motifs using suffix trees , 2013, Nucleic Acids Res..

[5]  N. Perrone-Bizzozero,et al.  Novel recognition motifs and biological functions of the RNA-binding protein HuD revealed by genome-wide identification of its targets , 2009, Nucleic acids research.

[6]  Anton J. Enright,et al.  Detecting microRNA binding and siRNA off-target effects from expression data , 2008, Nature Methods.

[7]  Zohar Yakhini,et al.  Discovering Motifs in Ranked Lists of DNA Sequences , 2007, PLoS Comput. Biol..

[8]  L. Lim,et al.  MicroRNA targeting specificity in mammals: determinants beyond seed pairing. , 2007, Molecular cell.

[9]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[10]  Eric R. Ziegel,et al.  Statistical Methods in Bioinformatics , 2002, Technometrics.

[11]  P. Pandolfi,et al.  A coding-independent function of gene and pseudogene mRNAs regulates tumour biology , 2010, Nature.

[12]  Thomas Mailund,et al.  Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions , 2013, Biology.

[13]  Manolis Kellis,et al.  New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes. , 2011, Genome research.

[14]  N. Rajewsky,et al.  Cell-type-specific signatures of microRNAs on target mRNA expression. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Lars Juhl Jensen,et al.  Automatic discovery of regulatory patterns in promoter regions based on whole cell expression data and functional annotation , 2000, Bioinform..

[16]  L. Lim,et al.  Transcripts Targeted by the MicroRNA-16 Family Cooperatively Regulate Cell Cycle Progression , 2007, Molecular and Cellular Biology.

[17]  Israel Steinfeld,et al.  miRNA target enrichment analysis reveals directly active miRNAs in health and disease , 2012, Nucleic acids research.

[18]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[19]  E. Blackburn,et al.  A tandemly repeated sequence at the termini of the extrachromosomal ribosomal RNA genes in Tetrahymena. , 1978, Journal of molecular biology.