The statistics of identifying differentially expressed genes in Expresso and TM4: a comparison

BackgroundAnalysis of DNA microarray data takes as input spot intensity measurements from scanner software and returns differential expression of genes between two conditions, together with a statistical significance assessment. This process typically consists of two steps: data normalization and identification of differentially expressed genes through statistical analysis. The Expresso microarray experiment management system implements these steps with a two-stage, log-linear ANOVA mixed model technique, tailored to individual experimental designs. The complement of tools in TM4, on the other hand, is based on a number of preset design choices that limit its flexibility. In the TM4 microarray analysis suite, normalization, filter, and analysis methods form an analysis pipeline. TM4 computes integrated intensity values (IIV) from the average intensities and spot pixel counts returned by the scanner software as input to its normalization steps. By contrast, Expresso can use either IIV data or median intensity values (MIV). Here, we compare Expresso and TM4 analysis of two experiments and assess the results against qRT-PCR data.ResultsThe Expresso analysis using MIV data consistently identifies more genes as differentially expressed, when compared to Expresso analysis with IIV data. The typical TM4 normalization and filtering pipeline corrects systematic intensity-specific bias on a per microarray basis. Subsequent statistical analysis with Expresso or a TM4 t-test can effectively identify differentially expressed genes. The best agreement with qRT-PCR data is obtained through the use of Expresso analysis and MIV data.ConclusionThe results of this research are of practical value to biologists who analyze microarray data sets. The TM4 normalization and filtering pipeline corrects microarray-specific systematic bias and complements the normalization stage in Expresso analysis. The results of Expresso using MIV data have the best agreement with qRT-PCR results. In one experiment, MIV is a better choice than IIV as input to data normalization and statistical analysis methods, as it yields as greater number of statistically significant differentially expressed genes; TM4 does not support the choice of MIV input data. Overall, the more flexible and extensive statistical models of Expresso achieve more accurate analytical results, when judged by the yardstick of qRT-PCR data, in the context of an experimental design of modest complexity.

[1]  I. Baldwin,et al.  Microarrays in ecological research: A case study of a cDNA microarray for plant-herbivore interactions , 2004, BMC Ecology.

[2]  Lenwood S. Heath,et al.  Studying the Functional Genomics of Stress Responses in Loblolly Pine With the Expresso Microarray Experiment Management System , 2002, Comparative and functional genomics.

[3]  A. Khodursky,et al.  A Case Study on Choosing Normalization Methods and Test Statistics for Two-Channel Microarray Data , 2004, Comparative and functional genomics.

[4]  Allan A. Sioson,et al.  Expresso and chips: creating a next generation microarray experiment management system , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[5]  S. Pääbo,et al.  A Neutral Model of Transcriptome Evolution , 2004, PLoS biology.

[6]  A I Saeed,et al.  TM4: a free, open-source system for microarray data management and analysis. , 2003, BioTechniques.

[7]  X. Cui,et al.  Statistical tests for differential expression in cDNA microarray experiments , 2003, Genome Biology.

[8]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[9]  M. Futschik,et al.  Model selection and efficiency testing for normalization of cDNA microarray data , 2004, Genome Biology.

[10]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[11]  Juan P. Steibel,et al.  Reassessing Design and Analysis of two-Colour Microarray Experiments Using Mixed Effects Models , 2005, Comparative and functional genomics.

[12]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[13]  Jerry Li,et al.  Within the fold: assessing differential expression measures and reproducibility in microarray assays , 2002, Genome Biology.

[14]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[15]  Russell D. Wolfinger,et al.  Comparison of Li-Wong and loglinear mixed models for the statistical analysis of oligonucleotide arrays , 2004, Bioinform..

[16]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[17]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[18]  G. Churchill Using ANOVA to analyze microarray data. , 2004, BioTechniques.

[19]  M. Kathleen Kerr,et al.  Linear Models for Microarray Data Analysis: Hidden Similarities and Differences , 2003, J. Comput. Biol..

[20]  Pierre R. Bushel,et al.  Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models , 2001, J. Comput. Biol..

[21]  John Quackenbush,et al.  Open source software for the analysis of microarray data. , 2003, BioTechniques.

[22]  Simon Rogers,et al.  Prognostic classification of relapsing favorable histology Wilms tumor using cDNA microarray expression profiling and support vector machines , 2004, Genes, chromosomes & cancer.

[23]  Jonathan I. Watkinson,et al.  Photosynthetic Acclimation Is Reflected in Specific Patterns of Gene Expression in Drought-Stressed Loblolly Pine1[w] , 2003, Plant Physiology.

[24]  Richard H. Scheuermann,et al.  Analysis of the Major Patterns of B Cell Gene Expression Changes in Response to Short-Term Stimulation with 33 Single Ligands1 , 2004, The Journal of Immunology.