A flexible approximate likelihood ratio test for detecting differential expression in microarray data

Identifying differentially expressed genes in microarray data has been studied extensively and several methods have been proposed. Most popular methods in the study of gene expression microarray data analysis rely on normal distribution assumption and are based on a Wald statistic. These methods may be inefficient when expression levels follow a skewed distribution. To deal with possible violations of the normality assumption, we propose a method based on Generalized Logistic Distribution of Type II (GLDII). The motivation behind this distributional assumption is to allow longer tails than normal distribution. This is important in analyzing gene expression data since extreme values are common in such experiments. The shape parameter for GLDII allows flexibility in modeling a wide range of distributions. To simplify the computational complexity involved in carrying out Likelihood Ratio (LR) tests for several thousands of genes, an Approximate LR Test (ALRT) is proposed. We also generalize the two-class ALRT method to multi-class microarray data. The performance of the ALRT method under the GLDII assumption is compared to methods based on Wald-type statistics using simulation. The results from the simulations show that our method performs quite well compared to the significance analysis of microarrays (SAM) approach using standardized Wilcoxon rank statistics and the empirical Bayes (E-B) t-statistics. Our method is also less sensitive to extreme values. We illustrate our method using two publicly available gene expression data sets.

[1]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[2]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[3]  Stat Pairs,et al.  Statistical Algorithms Description Document , 2022 .

[4]  Joseph Beyene,et al.  Tests for differential gene expression using weights in oligonucleotide microarray experiments , 2006, BMC Genomics.

[5]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[6]  X. Cui,et al.  Improved statistical tests for differential gene expression by shrinking variance components estimates. , 2005, Biostatistics.

[7]  J. Thomas,et al.  An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. , 2001, Genome research.

[8]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[9]  Rebecca W. Doerge,et al.  Gene expression data: The technology and statistical analysis , 2003 .

[10]  Sunil K. Mathur,et al.  A Nonparametric Likelihood Ratio Test to Identify Differentially Expressed Genes from Microarray Data , 2006, Applied bioinformatics.

[11]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[12]  Ian B. Jeffery,et al.  Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data , 2006, BMC Bioinformatics.

[13]  John D. Storey,et al.  SAM Thresholding and False Discovery Rates for Detecting Differential Gene Expression in DNA Microarrays , 2003 .

[14]  Lawrence Hunter,et al.  GEST: a gene expression search tool based on a novel Bayesian similarity metric , 2001, ISMB.

[15]  Daniel Q. Naiman,et al.  Classifying Gene Expression Profiles from Pairwise mRNA Comparisons , 2004, Statistical applications in genetics and molecular biology.

[16]  Debashis Ghosh,et al.  Mixture models for assessing differential expression in complex tissues using microarray data , 2004, Bioinform..

[17]  L. Kunkel,et al.  Gene expression comparison of biopsies from Duchenne muscular dystrophy (DMD) and normal skeletal muscle , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Hinrich W. H. Göhlmann,et al.  An Investigation on Performance of Significance Analysis of Microarray (SAM) for the Comparisons of Several Treatments with one Control in the Presence of Small‐variance Genes , 2008, Biometrical journal. Biometrische Zeitschrift.

[19]  Sandrine Dudoit,et al.  Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates , 2004, Statistical applications in genetics and molecular biology.

[20]  Y. Benjamini,et al.  Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics , 1999 .

[21]  Ahmed Hossain,et al.  Approximate MLEs of the parameters of location-scale models under type II censoring , 2007 .

[22]  D. Edwards,et al.  Statistical Analysis of Gene Expression Microarray Data , 2003 .

[23]  Wei Pan,et al.  Modified Nonparametric Approaches to Detecting Differentially Expressed Genes in Replicated Microarray Experiments , 2003, Bioinform..

[24]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Shunpu Zhang,et al.  An Improved Nonparametric Approach for Detecting Differentially Expressed Genes with Replicated Microarray Data , 2007, Statistical applications in genetics and molecular biology.

[26]  Darlene R Goldstein,et al.  A Laplace mixture model for identification of differential expression in microarray experiments. , 2006, Biostatistics.

[27]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[28]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[29]  Trey Ideker,et al.  Testing for Differentially-Expressed Genes by Maximum-Likelihood Analysis of Microarray Data , 2000, J. Comput. Biol..

[30]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[31]  Narayanaswamy Balakrishnan,et al.  Inference for the Type II generalized logistic distribution under progressive Type II censoring , 2007 .