A Generalized Additive Model For Microarray Gene Expression Data Analysis

Abstract Microarray technology allows the measurement of expression levels of a large number of genes simultaneously. There are inherent biases in microarray data generated from an experiment. Various statistical methods have been proposed for data normalization and data analysis. This paper proposes a generalized additive model for the analysis of gene expression data. This model consists of two sub-models: a non-linear model and a linear model. We propose a two-step normalization algorithm to fit the two sub-models sequentially. The first step involves a non-parametric regression using lowess fits to adjust for non-linear systematic biases. The second step uses a linear ANOVA model to estimate the remaining effects including the interaction effect of genes and treatments, the effect of interest in a study. The proposed model is a generalization of the ANOVA model for microarray data analysis. We show correspondences between the lowess fit and the ANOVA model methods. The normalization procedure does not assume the majority of genes do not change their expression levels, and neither does it assume two channel intensities from the same spot are independent. The procedure can be applied to either one channel or two channel data from the experiments with multiple treatments or multiple nuisance factors. Two toxicogenomic experiment data sets and a simulated data set are used to contrast the proposed method with the commonly known lowess fit and ANOVA methods.

[1]  T. Kepler,et al.  Normalization and analysis of DNA microarray data by self-consistency and local regression , 2002, Genome Biology.

[2]  J. J. Chen,et al.  Profiling expression patterns and isolating differentially expressed genes by cDNA microarray system with colorimetry detection. , 1998, Genomics.

[3]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[4]  James J. Chen,et al.  Normalization Methods for Analysis of Microarray Gene-Expression Data , 2003, Journal of biopharmaceutical statistics.

[5]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[6]  G. Parmigiani,et al.  A Statistical Analysis of Radiolabeled Gene Expression Data , 2001 .

[7]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[8]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[9]  E. Wolski,et al.  Normalization strategies for cDNA microarrays. , 2000, Nucleic acids research.

[10]  A. Genz,et al.  Numerical computation of multivariate t-probabilities with application to power calculation of multiple contrasts , 1999 .

[11]  Pierre R. Bushel,et al.  Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models , 2001, J. Comput. Biol..

[12]  J. Cherry,et al.  Iterative linear regression by sector: renormalization of cDNA microarray data and cluster analysis weighted by cross homology , 2001 .

[13]  G. Churchill,et al.  Experimental design for gene expression microarrays. , 2001, Biostatistics.

[14]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[15]  Pierre R. Bushel,et al.  STATISTICAL ANALYSIS OF A GENE EXPRESSION MICROARRAY EXPERIMENT WITH REPLICATION , 2002 .