Identification of Differentially Expressed Genes with Multivariate Outlier Analysis

Abstract DNA microarray offers a powerful and effective technology to monitor the changes in the gene expression levels for thousands of genes simultaneously. It is being widely applied to explore the quantitative alternation in gene regulation in response to a variety of aspects including diseases and exposure of toxicant. A common task in analyzing microarray data is to identify the differentially expressed genes under two different experimental conditions. Because of the large number of genes and small number of arrays, and higher signal-noise ratio in microarray data, many traditional approaches seem improper. In this paper, a multivariate mixture model is applied to model the expression level of replicated arrays, considering the differentially expressed genes as the outliers of the expression data. In order to detect the outliers of the multivariate mixture model, an effective and robust statistical method is first applied to microarray analysis. This method is based on the analysis of kurtosis coefficient (KC) of the projected multivariate data arising from a mixture model so as to identify the outliers. We utilize the multivariate KC algorithm to our microarray experiment with the control and toxic treatment. After the processing of data, the differential genes are successfully identified from 1824 genes on the UCLA M07 microarray chip. We also use the RT-PCR method and two robust statistical methods, minimum covariance determinant (MCD) and minimum volume ellipsoid (MVE), to verify the expression level of outlier genes identified by KC algorithm. We conclude that the robust multivariate tool is practical and effective for the detection of differentially expressed genes.

[1]  Robert Tibshirani,et al.  Microarrays and Their Use in a Comparative Experiment , 2000 .

[2]  David L. Woodruff,et al.  Identification of Outliers in Multivariate Data , 1996 .

[3]  G. Schwartz,et al.  Is cadmium a cause of human pancreatic cancer? , 2000, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[4]  H P Friedman,et al.  Searching for evidence of altered gene expression: a comment on statistical analysis of microarray data. , 1999, Journal of the National Cancer Institute.

[5]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[6]  D. Ruppert Robust Statistics: The Approach Based on Influence Functions , 1987 .

[7]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[8]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[9]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[10]  Y. Chen,et al.  Ratio-based decisions and the quantitative analysis of cDNA microarray images. , 1997, Journal of biomedical optics.

[11]  R E Stoll,et al.  Assessment of cisplatin-induced nephrotoxicity by microarray technology. , 2001, Toxicological sciences : an official journal of the Society of Toxicology.

[12]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[13]  William S. Cleveland,et al.  Visualizing Data , 1993 .

[14]  Richard B. Hayes,et al.  The carcinogenicity of metals in humans , 1997, Cancer Causes & Control.

[15]  M. Waalkes,et al.  Cadmium and prostate cancer. , 1994, Journal of toxicology and environmental health.

[16]  S. Hilsenbeck,et al.  Statistical analysis of array expression data as applied to the problem of tamoxifen resistance. , 1999, Journal of the National Cancer Institute.

[17]  Terry Speed,et al.  Normalization of cDNA microarray data. , 2003, Methods.

[18]  S. Stavchansky,et al.  Human endothelial cell response to gram-negative lipopolysaccharide assessed with cDNA microarrays. , 2001, American journal of physiology. Cell physiology.

[19]  L. Penland,et al.  Use of a cDNA microarray to analyse gene expression patterns in human cancer , 1996, Nature Genetics.

[20]  T. Nakano,et al.  Neurophilin‐1 is a downstream target of transcription factor Ets‐1 in human umbilical vein endothelial cells , 2001, FEBS letters.

[21]  S. Choudhuri,et al.  Differential expression of the metallothionein gene in liver and brain of mice and rats. , 1993, Toxicology and applied pharmacology.

[22]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[23]  Rainer Spang,et al.  DNA Microarray Data Analysis and Regression Modeling for Genetic Expression Profiling , 2000 .

[24]  P. Brown,et al.  Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[25]  R. Doll Is cadmium a human carcinogen? , 1992, Annals of epidemiology.

[26]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Francisco J. Prieto,et al.  Multivariate Outlier Detection and Robust Covariance Matrix Estimation , 2001, Technometrics.

[28]  M. Anver,et al.  Carcinogenic effects of cadmium in the noble (NBL/Cr) rat: induction of pituitary, testicular, and injection site tumors and intraepithelial proliferative lesions of the dorsolateral prostate. , 1999, Toxicological sciences : an official journal of the Society of Toxicology.

[29]  M. Anver,et al.  Chronic toxic and carcinogenic effects of oral cadmium in the Noble (NBL/Cr) rat: induction of neoplastic and proliferative lesions of the adrenal, kidney, prostate, and testes. , 1999, Journal of toxicology and environmental health. Part A.

[30]  J. Thomas,et al.  An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. , 2001, Genome research.

[31]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[32]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[33]  Pierre R. Bushel,et al.  STATISTICAL ANALYSIS OF A GENE EXPRESSION MICROARRAY EXPERIMENT WITH REPLICATION , 2002 .

[34]  A. Perantoni,et al.  Apparent deficiency of metallothionein in the Wistar rat prostate. , 1989, Toxicology and applied pharmacology.

[35]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[36]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.