Comments on selected fundamental aspects of microarray analysis

Microarrays are becoming a ubiquitous tool of research in life sciences. However, the working principles of microarray-based methodologies are often misunderstood or apparently ignored by the researchers who actually perform and interpret experiments. This in turn seems to lead to a common over-expectation regarding the explanatory and/or knowledge-generating power of microarray analyses. In this note we intend to explain basic principles of five (5) major groups of analytical techniques used in studies of microarray data and their interpretation: the principal component analysis (PCA), the independent component analysis (ICA), the t-test, the analysis of variance (ANOVA), and self organizing maps (SOM). We discuss answers to selected practical questions related to the analysis of microarray data. We also take a closer look at the experimental setup and the rules, which have to be observed in order to exploit microarrays efficiently. Finally, we discuss in detail the scope and limitations of microarray-based methods. We emphasize the fact that no amount of statistical analysis can compensate for (or replace) a well thought through experimental setup. We conclude that microarrays are indeed useful tools in life sciences but by no means should they be expected to generate complete answers to complex biological questions. We argue that even well posed questions, formulated within a microarray-specific terminology, cannot be completely answered with the use of microarray analyses alone.

[1]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[2]  S. P. Fodor,et al.  Light-directed, spatially addressable parallel chemical synthesis. , 1991, Science.

[3]  S. Dudoit,et al.  Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. , 2000, Genome research.

[4]  G. Churchill,et al.  Statistical design and the analysis of gene expression microarray data. , 2001, Genetical research.

[5]  Richard Baumgartner,et al.  Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions , 2003, Bioinform..

[6]  M Vingron Bioinformatics needs to adopt statistical thinking. , 2001, Bioinformatics.

[7]  Neil D. Lawrence,et al.  Reducing the variability in cDNA microarray image processing by Bayesian inference , 2004, Bioinform..

[8]  T. Joos,et al.  Protein microarray technology. , 2002, Trends in biotechnology.

[9]  Radka Stoyanova,et al.  A novel approach for increasing sensitivity and correcting saturation artifacts of radioactively labeled cDNA arrays , 2004, Bioinform..

[10]  J. M. Deutsch,et al.  Evolutionary algorithms for finding optimal gene sets in microarray prediction , 2003, Bioinform..

[11]  Jun Chen,et al.  Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes , 2004, BMC Bioinformatics.

[12]  Eduardo D. Sontag,et al.  Inferring dynamic architecture of cellular networks using time series of gene expression, protein and metabolite data , 2004, Bioinform..

[13]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[14]  David J. C. MacKay,et al.  A decomposition model to track gene expression signatures: preview on observer-independent classification of ovarian cancer , 2002, Bioinform..

[15]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[16]  Ralf Stanewsky,et al.  Genetic analysis of the circadian system in Drosophila melanogaster and mammals. , 2003, Journal of neurobiology.

[17]  A. Donaldson,et al.  DNA replication: telling time with microarrays , 2003, Genome Biology.

[18]  Seth J Davis,et al.  Watching the hands of the Arabidopsisbiological clock , 2001, Genome Biology.

[19]  Magnus Rattray,et al.  Making sense of microarray data distributions , 2002, Bioinform..

[20]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Hans A Hofmann,et al.  Biologically meaningful expression profiling across species using heterologous hybridization to a cDNA microarray , 2004, BMC Genomics.

[22]  C. D. Kemp,et al.  The Advanced Theory of Statistics, Vol. 3. Design and Analysis and Time- Series. , 1984 .

[23]  R. A. Fisher,et al.  Design of Experiments , 1936 .

[24]  Susmita Datta,et al.  Comparisons and validation of statistical clustering techniques for microarray gene expression data , 2003, Bioinform..

[25]  Tomokazu Konishi,et al.  Three-parameter lognormal distribution ubiquitously found in cDNA microarray data and its application to parametric data treatment , 2004, BMC Bioinformatics.

[26]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[27]  M. Hirai,et al.  Integration of transcriptomics and metabolomics for understanding of global responses to nutritional stresses in Arabidopsis thaliana. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Ming Ouyang,et al.  Gaussian mixture clustering and imputation of microarray data , 2004, Bioinform..

[29]  Truman R. Brown,et al.  Normalization of single-channel DNA array data by principal component analysis , 2004, Bioinform..

[30]  Johan T den Dunnen,et al.  Gene expression variation between mouse inbred strains , 2004, BMC Genomics.

[31]  Debashis Ghosh,et al.  Prognostic meta-signature of breast cancer developed by two-stage mixture modeling of microarray data , 2004, BMC Genomics.

[32]  Bruno Torrésani,et al.  Blind Source Separation and the Analysis of Microarray Data , 2004, J. Comput. Biol..

[33]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[34]  Yingdong Zhao,et al.  An adaptive method for cDNA microarray normalization , 2004, BMC Bioinformatics.

[35]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Kaiyu Jiang,et al.  Novel approaches to gene expression analysis of active polyarticular juvenile rheumatoid arthritis , 2003, Arthritis research & therapy.

[37]  Kenneth Mather,et al.  Statistical Analysis in Biology , 1948 .

[38]  H. Bussemaker,et al.  Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Seth J Davis,et al.  Watching the hands of the Arabidopsis biological clock , 2001, Genome Biology.

[40]  R. Ekins,et al.  Multi-analyte immunoassay. , 1989, Journal of pharmaceutical and biomedical analysis.

[41]  Ki-Yeol Kim,et al.  Reuse of imputed data in microarray analysis increases imputation efficiency , 2004, BMC Bioinformatics.

[42]  Michael Aschner,et al.  Use of microarray technologies in toxicology research. , 2003, Neurotoxicology.

[43]  A. Whitehead,et al.  Variation in tissue-specific gene expression among natural populations , 2005, Genome Biology.

[44]  Kwang-Hyun Cho,et al.  Advanced significance analysis of microarray data based on weighted resampling: a comparative study and application to gene deletions in Mycobacterium bovis , 2004, Bioinform..

[45]  R Ekins,et al.  Multispot, multianalyte, immunoassay. , 1990, Annales de biologie clinique.

[46]  Xiaobo Zhou,et al.  Missing-value estimation using linear and non-linear regression with Bayesian gene selection , 2003, Bioinform..

[47]  D. Lockhart,et al.  Functional Genomics , 1999, Springer Netherlands.

[48]  E. Southern,et al.  Analyzing and comparing nucleic acid sequences by hybridization to arrays of oligonucleotides: evaluation using experimental models. , 1992, Genomics.

[49]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[50]  Roger E Bumgarner,et al.  From co-expression to co-regulation: how many microarray experiments do we need? , 2004, Genome Biology.

[51]  Alessandra Riva,et al.  The difficult interpretation of transcriptome data: the case of the GATC regulatory network , 2004, Comput. Biol. Chem..

[52]  S. P. Fodor,et al.  High density synthetic oligonucleotide arrays , 1999, Nature Genetics.

[53]  Ma,et al.  Building a dictionary , 1983 .

[54]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[55]  E. Hovig,et al.  Profound influence of microarray scanner characteristics on gene expression ratios: analysis and procedure for correction , 2004, BMC Genomics.

[56]  James J. Chen,et al.  Analysis of variance components in gene expression data , 2004, Bioinform..

[57]  G. Churchill,et al.  Statistical design and the analysis of gene expression microarray data. , 2007, Genetical research.

[58]  Kenneth Mather Statistical analysis in biology , 1943 .

[59]  T. J. Breen,et al.  Biostatistical Analysis (2nd ed.). , 1986 .

[60]  F W Chu,et al.  Multianalyte microspot immunoassay--microanalytical "compact disk" of the future. , 1991, Clinical chemistry.

[61]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[62]  Peter J. Park,et al.  Comparing expression profiles of genes with similar promoter regions , 2002, Bioinform..

[63]  Jarkko Venna,et al.  Trustworthiness and metrics in visualizing similarity of gene expression , 2003, BMC Bioinformatics.

[64]  Bertrand Jordan,et al.  Historical Background and Anticipated Developments , 2002, Annals of the New York Academy of Sciences.

[65]  Kathleen Marchal,et al.  Adaptive quality-based clustering of gene expression profiles , 2002, Bioinform..

[66]  George Stephanopoulos,et al.  Determination of minimum sample size and discriminatory expression patterns in microarray data , 2002, Bioinform..

[67]  Gilles Didier,et al.  The operons, a criterion to compare the reliability of transcriptome analysis tools: ICA is more reliable than ANOVA, PLS and PCA , 2004, Comput. Biol. Chem..

[68]  H. Mori,et al.  Genome‐wide analysis of deoxyadenosine methyltransferase‐mediated control of gene expression in Escherichia coli , 2002, Molecular microbiology.

[69]  J Carl Barrett,et al.  Microarrays : the use of oligonucleotides and cDNA for the analysis of gene expression , 2003 .

[70]  Jean-Jacques Daudin,et al.  Extracting biological information from DNA arrays: an unexpected link between arginine and methionine metabolism in Bacillus subtilis , 2001, Genome Biology.

[71]  Martin H Brutsche,et al.  Functional genomics and gene microarrays--the use in research and clinical medicine. , 2003, Swiss medical weekly.

[72]  M. Kendall,et al.  The advanced theory of statistics , 1945 .

[73]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[74]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[75]  Mei-Ling Ting Lee,et al.  Analysis of Microarray Gene Expression Data , 2004, Springer US.

[76]  Aeilko H. Zwinderman,et al.  Comparing transformation methods for DNA microarray data , 2004, BMC Bioinformatics.

[77]  Markus Neuhäuser,et al.  The Baumgartner-Wei?-Schindler test for the detection of differentially expressed genes in replicated microarray experiments , 2004, Bioinform..

[78]  Martin D. Brand,et al.  Analysing microarray data using modular regulation analysis , 2004, Bioinform..

[79]  Jacques Corbeil,et al.  Statistical analysis of high-density oligonucleotide arrays: a multiplicative noise model , 2002, Bioinform..

[80]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..