SIBER: systematic identification of bimodally expressed genes using RNAseq data

MOTIVATION Identification of bimodally expressed genes is an important task, as genes with bimodal expression play important roles in cell differentiation, signalling and disease progression. Several useful algorithms have been developed to identify bimodal genes from microarray data. Currently, no method can deal with data from next-generation sequencing, which is emerging as a replacement technology for microarrays. RESULTS We present SIBER (systematic identification of bimodally expressed genes using RNAseq data) for effectively identifying bimodally expressed genes from next-generation RNAseq data. We evaluate several candidate methods for modelling RNAseq count data and compare their performance in identifying bimodal genes through both simulation and real data analysis. We show that the lognormal mixture model performs best in terms of power and robustness under various scenarios. We also compare our method with alternative approaches, including profile analysis using clustering and kurtosis (PACK) and cancer outlier profile analysis (COPA). Our method is robust, powerful, invariant to shifting and scaling, has no blind spots and has a sample-size-free interpretation. AVAILABILITY The R package SIBER is available at the website http://bioinformatics.mdanderson.org/main/OOMPA:Overview.

[1]  J. Tchinda,et al.  Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. , 2006, Science.

[2]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[3]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[4]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[5]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[6]  S. Srivastava,et al.  A two-parameter generalized Poisson model to improve the analysis of RNA-seq data , 2010, Nucleic acids research.

[7]  Kenneth K. Lopiano,et al.  RNA-seq: technical variability and sampling , 2011, BMC Genomics.

[8]  Sanghyuk Lee,et al.  Accurate quantification of transcriptome from RNA-Seq data by effective length normalization , 2010, Nucleic Acids Res..

[9]  I. Ellis,et al.  An immune response gene expression module identifies a good prognosis subtype in estrogen receptor negative breast cancer , 2007, Genome Biology.

[10]  J. Widom,et al.  Mechanism of Transcriptional Silencing in Yeast , 2005, Cell.

[11]  Adam Ertel,et al.  Switch-like genes populate cell communication pathways and are enriched for extracellular proteins , 2008, BMC Genomics.

[12]  Andrew E. Teschendorff,et al.  PACK: Profile Analysis using Clustering and Kurtosis to find molecular classifiers in cancer , 2006, Bioinform..

[13]  W. Huber,et al.  Differential expression analysis for sequence count data , 2010 .

[14]  Z. Yang,et al.  Bimodal Gene Prediction via Gap Maximisation , 2012 .

[15]  S. Ranade,et al.  Stem cell transcriptome profiling via massive-scale mRNA sequencing , 2008, Nature Methods.

[16]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[17]  Vanessa M Kvam,et al.  A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data. , 2012, American journal of botany.

[18]  Jeff H. Chang,et al.  The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq , 2011 .

[19]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[20]  Kevin R. Coombes,et al.  The Bimodality Index: A Criterion for Discovering and Ranking Bimodal Signatures from Cancer Gene Expression Profiling Data , 2009, Cancer informatics.

[21]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[22]  G. Crabtree,et al.  Cell signaling can direct either binary or graded transcriptional responses , 2001, The EMBO journal.

[23]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[24]  Matthieu Louis,et al.  Binary and Graded Responses in Gene Networks , 2002, Science's STKE.

[25]  Thomas J. Hardcastle,et al.  baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data , 2010, BMC Bioinformatics.

[26]  Wolfgang Huber,et al.  Detecting differential usage of exons from RNA-Seq data , 2012 .

[27]  Kevin R Coombes,et al.  Melanoma antigen family A identified by the bimodality index defines a subset of triple negative breast cancers as candidates for immune response augmentation. , 2012, European journal of cancer.