Arrow plot: a new graphical tool for selecting up and down regulated genes and genes differentially expressed on sample subgroups

BackgroundA common task in analyzing microarray data is to determine which genes are differentially expressed across two (or more) kind of tissue samples or samples submitted under experimental conditions. Several statistical methods have been proposed to accomplish this goal, generally based on measures of distance between classes. It is well known that biological samples are heterogeneous because of factors such as molecular subtypes or genetic background that are often unknown to the experimenter. For instance, in experiments which involve molecular classification of tumors it is important to identify significant subtypes of cancer. Bimodal or multimodal distributions often reflect the presence of subsamples mixtures. Consequently, there can be genes differentially expressed on sample subgroups which are missed if usual statistical approaches are used. In this paper we propose a new graphical tool which not only identifies genes with up and down regulations, but also genes with differential expression in different subclasses, that are usually missed if current statistical methods are used. This tool is based on two measures of distance between samples, namely the overlapping coefficient (OVL) between two densities and the area under the receiver operating characteristic (ROC) curve. The methodology proposed here was implemented in the open-source R software.ResultsThis method was applied to a publicly available dataset, as well as to a simulated dataset. We compared our results with the ones obtained using some of the standard methods for detecting differentially expressed genes, namely Welch t-statistic, fold change (FC), rank products (RP), average difference (AD), weighted average difference (WAD), moderated t-statistic (modT), intensity-based moderated t-statistic (ibmT), significance analysis of microarrays (samT) and area under the ROC curve (AUC). On both datasets all differentially expressed genes with bimodal or multimodal distributions were not selected by all standard selection procedures. We also compared our results with (i) area between ROC curve and rising area (ABCR) and (ii) the test for not proper ROC curves (TNRC). We found our methodology more comprehensive, because it detects both bimodal and multimodal distributions and different variances can be considered on both samples. Another advantage of our method is that we can analyze graphically the behavior of different kinds of differentially expressed genes.ConclusionOur results indicate that the arrow plot represents a new flexible and useful tool for the analysis of gene expression profiles from microarrays.

[1]  Wolfgang Huber,et al.  Antisense expression increases gene expression variability and locus interdependency , 2011, Molecular systems biology.

[2]  Henry F. Inman,et al.  The overlapping coefficient as a measure of agreement between probability distributions and point estimation of the overlap of two normal densities , 1989 .

[3]  Walter Krämer,et al.  Review of Modern applied statistics with S, 4th ed. by W.N. Venables and B.D. Ripley. Springer-Verlag 2002 , 2003 .

[4]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[5]  Ian B. Jeffery,et al.  Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data , 2006, BMC Bioinformatics.

[6]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[7]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[8]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[9]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[10]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[11]  Marco Muselli,et al.  Not proper ROC curves as new tool for the analysis of differentially expressed genes in microarray experiments , 2008, BMC Bioinformatics.

[12]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[13]  Mario Medvedovic,et al.  Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments , 2006, BMC Bioinformatics.

[14]  Wolfgang Huber,et al.  Genome-wide analysis of mRNA decay patterns during early Drosophila development , 2010, Genome Biology.

[15]  M. Muselli,et al.  ROC curves are a suitable and flexible tool for the analysis of gene expression profiles , 2003, Cytogenetic and Genome Research.

[16]  Wolfgang Huber,et al.  Genome-wide survey of post-meiotic segregation during yeast recombination , 2011, Genome Biology.

[17]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Wolfgang Huber,et al.  Mapping of signaling networks through synthetic genetic interaction analysis by RNAi , 2011, Nature Methods.

[19]  K S Berbaum,et al.  A contaminated binormal model for ROC data: Part I. Some interesting examples of binormal degeneracy. , 2000, Academic radiology.

[20]  Koji Kadota,et al.  A weighted average difference method for detecting differentially expressed genes from microarray data , 2008, Algorithms for Molecular Biology.

[21]  M. Pepe The Statistical Evaluation of Medical Tests for Classification and Prediction , 2003 .

[22]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[23]  Li Li,et al.  PADGE: analysis of heterogeneous patterns of differential gene expression. , 2007, Physiological genomics.

[24]  C. Metz,et al.  "Proper" Binormal ROC Curves: Theory and Maximum-Likelihood Estimation. , 1999, Journal of mathematical psychology.

[25]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[26]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[27]  M. Schummer,et al.  Selecting Differentially Expressed Genes from Microarray Experiments , 2003, Biometrics.

[28]  Kevin S. Berbaum,et al.  A contaminated binormal model for ROC data , 2000 .

[29]  RAINER BREITLING,et al.  Rank-based Methods as a Non-parametric Alternative of the T-statistic for the Analysis of Biological Microarray Data , 2005, J. Bioinform. Comput. Biol..