Robust Modeling of Differential Gene Expression Data Using Normal/Independent Distributions: A Bayesian Approach

In this paper, the problem of identifying differentially expressed genes under different conditions using gene expression microarray data, in the presence of outliers, is discussed. For this purpose, the robust modeling of gene expression data using some powerful distributions known as normal/independent distributions is considered. These distributions include the Student’s t and normal distributions which have been used previously, but also include extensions such as the slash, the contaminated normal and the Laplace distributions. The purpose of this paper is to identify differentially expressed genes by considering these distributional assumptions instead of the normal distribution. A Bayesian approach using the Markov Chain Monte Carlo method is adopted for parameter estimation. Two publicly available gene expression data sets are analyzed using the proposed approach. The use of the robust models for detecting differentially expressed genes is investigated. This investigation shows that the choice of model for differentiating gene expression data is very important. This is due to the small number of replicates for each gene and the existence of outlying data. Comparison of the performance of these models is made using different statistical criteria and the ROC curve. The method is illustrated using some simulation studies. We demonstrate the flexibility of these robust models in identifying differentially expressed genes.

[1]  Ying Nian Wu,et al.  Efficient Algorithms for Robust Estimation in Linear Mixed-Effects Models Using the Multivariate t Distribution , 2001 .

[2]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[3]  M. Genton,et al.  The multivariate skew-slash distribution , 2006 .

[4]  P. Müller,et al.  Optimal Sample Size for Multiple Testing , 2004 .

[5]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[6]  Peter Sykacek,et al.  Biological assessment of robust noise models in microarray data analysis , 2011, Bioinform..

[7]  Shankar Subramaniam,et al.  Gene-expression measurement: variance-modeling considerations for robust data analysis , 2012, Nature Immunology.

[8]  J. Geweke,et al.  Bayesian Treatment of the Independent Student- t Linear Model , 1993 .

[9]  Joseph G Ibrahim,et al.  Identification of Differentially Expressed Genes in High‐Density Oligonucleotide Arrays Accounting for the Quantification Limits of the Technology , 2003, Biometrics.

[10]  Bani K. Mallick,et al.  Bayesian Analysis of Gene Expression Data , 2009 .

[11]  Samuel Kotz,et al.  Multivariate T-Distributions and Their Applications , 2004 .

[12]  Shunpu Zhang,et al.  A comprehensive evaluation of SAM, the SAM R-package and a simple modification to improve its performance , 2007, BMC Bioinformatics.

[13]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[14]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[15]  M. West On scale mixtures of normal distributions , 1987 .

[16]  N. L. Johnson,et al.  Continuous Multivariate Distributions: Models and Applications , 2005 .

[17]  Alice S. Whittemore,et al.  A Bayesian False Discovery Rate for Multiple Testing , 2007 .

[18]  C M Kendziorski,et al.  On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles , 2003, Statistics in medicine.

[19]  George Casella,et al.  Statistical Inference Second Edition , 2007 .

[20]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[21]  Roger E Bumgarner,et al.  Bayesian Robust Inference for Differential Gene Expression in Microarrays with Multiple Samples , 2004, Biometrics.

[22]  Roger E Bumgarner,et al.  Robust Estimation of cDNA Microarray Intensities with Replicates , 2003 .

[23]  K. Lange,et al.  Normal/Independent Distributions and Their Applications in Robust Regression , 1993 .

[24]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[25]  J. Sudbø,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[26]  R. Arellano-Valle,et al.  LIKELIHOOD BASED INFERENCE FOR SKEW-NORMAL INDEPENDENT LINEAR MIXED MODELS , 2010 .

[27]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[28]  R. Little Robust Estimation of the Mean and Covariance Matrix from Data with Missing Values , 1988 .

[29]  J. Ibrahim,et al.  Bayesian Models for Gene Expression With DNA Microarray Data , 2002 .

[30]  Y. Chen,et al.  Ratio-based decisions and the quantitative analysis of cDNA microarray images. , 1997, Journal of biomedical optics.

[31]  Robert Tibshirani,et al.  SAM “Significance Analysis of Microarrays” Users guide and technical document , 2002 .

[32]  Jeremy MG Taylor,et al.  Robust Statistical Modeling Using the t Distribution , 1989 .

[33]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[34]  Darlene R Goldstein,et al.  A Laplace mixture model for identification of differential expression in microarray experiments. , 2006, Biostatistics.

[35]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Jaroslav Albert,et al.  Robust non-linear differential equation models of gene expression evolution across Drosophila development , 2012, BMC Research Notes.

[37]  Raphael Gottardo,et al.  A Flexible and Powerful Bayesian Hierarchical Model for ChIP–Chip Experiments , 2008, Biometrics.

[38]  Te-Won Lee,et al.  On the multivariate Laplace distribution , 2006, IEEE Signal Processing Letters.