The Analysis of Gene Expression Data: An Overview of Methods and Software

This chapter is a rough map of the book. It provides a concise overview of data-analytic tasks associated with microarray studies, pointers to chapters that can help perform these tasks, and connections with selected data-analytic tools not covered in any of the chapters. We wish to give a general orientation before moving to the detailed discussion provided by individual chapters. A comprehensive review of microarray data analysis methods is beyond the scope of this introduction.

[1]  A. Butte,et al.  Microarrays for an Integrative Genomics , 2002 .

[2]  John Quackenbush,et al.  Computational genetics: Computational analysis of microarray data , 2001, Nature Reviews Genetics.

[3]  W. Pan,et al.  How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach , 2002, Genome Biology.

[4]  I. Johnstone,et al.  Adapting to unknown sparsity by controlling the false discovery rate , 2005, math/0505374.

[5]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[7]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[8]  Godfried T. Toussaint,et al.  Bibliography on estimation of misclassification , 1974, IEEE Trans. Inf. Theory.

[9]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[10]  W. D. Ray,et al.  Statistics for Experiments. An Introduction to Design, Data Analysis and Model Building , 1979 .

[11]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[12]  T. Kohonen Analysis of a simple self-organizing process , 1982, Biological Cybernetics.

[13]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[14]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[15]  Jill P. Mesirov,et al.  Class prediction and discovery using gene expression data , 2000, RECOMB '00.

[16]  D. Botstein,et al.  Gene expression patterns in human liver cancers. , 2002, Molecular biology of the cell.

[17]  Gary Hardiman Microarray Technologies – An Overview , 2002 .

[18]  M. Clyde,et al.  Prediction via Orthogonalized Model Mixing , 1996 .

[19]  G. A. Whitmore,et al.  Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Ben Taskar,et al.  Rich probabilistic models for gene expression , 2001, ISMB.

[21]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[23]  M. Oh,et al.  Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. , 2001, Nucleic acids research.

[24]  Sam Kash Kachigan Multivariate statistical analysis: A conceptual introduction , 1982 .

[25]  B. Everitt,et al.  Applied Multivariate Data Analysis: Everitt/Applied Multivariate Data Analysis , 2001 .

[26]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[27]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[28]  J. Berger,et al.  Testing Precise Hypotheses , 1987 .

[29]  Pierre R. Bushel,et al.  Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models , 2001, J. Comput. Biol..

[30]  Peter J. Bickel,et al.  S: An Interactive Environment for Data Analysis and Graphics , 1984 .

[31]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[32]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[33]  F. Collins Microarrays and macroconsequences , 1999, Nature Genetics.

[34]  Jing Yin,et al.  Artificial neural networks and gene filtering distinguish between global gene expression profiles of Barrett's esophagus and esophageal cancer. , 2002, Cancer research.

[35]  M K Kerr,et al.  Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[36]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[37]  G. Churchill,et al.  Statistical design and the analysis of gene expression microarray data. , 2001, Genetical research.

[38]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[39]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[40]  William Stafford Noble,et al.  Classification of genes using probabilistic models of microarray expression profiles , 2001, BIOKDD.

[41]  Peter Müller,et al.  Feedforward Neural Networks for Nonparametric Regression , 1998 .

[42]  Ramanathan Gnanadesikan,et al.  Methods for statistical data analysis of multivariate observations , 1977, A Wiley publication in applied statistics.

[43]  G. Parmigiani,et al.  A statistical framework for expression‐based molecular classification in cancer , 2002 .

[44]  Yoonkyung Lee,et al.  Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data , 2003, Bioinform..

[45]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[46]  Richard M. Simon,et al.  A Paradigm for Class Prediction Using Gene Expression Profiles , 2003, J. Comput. Biol..

[47]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[48]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[49]  J. Ibrahim,et al.  Bayesian Models for Gene Expression With DNA Microarray Data , 2002 .

[50]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[51]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[52]  M. Bittner,et al.  Expression profiling using cDNA microarrays , 1999, Nature Genetics.

[53]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[54]  Richard M. Simon,et al.  Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data , 2002, Bioinform..

[55]  Michael B. Eisen,et al.  Visualizing associations between genome sequences and gene expression data using genome-mean expression profiles , 2001, ISMB.

[56]  N. Sampas,et al.  Molecular classification of cutaneous malignant melanoma by gene expression profiling , 2000, Nature.

[57]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[58]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[59]  Heping Zhang,et al.  Tree-based analysis of microarray data for classifying breast cancer. , 2002, Frontiers in bioscience : a journal and virtual library.

[60]  E. Southern,et al.  DNA microarrays. History and overview. , 2001, Methods in molecular biology.

[61]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[62]  Mia Hubert,et al.  Clustering in an object-oriented environment , 1997 .

[63]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[64]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[65]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[66]  Y. Chen,et al.  Ratio-based decisions and the quantitative analysis of cDNA microarray images. , 1997, Journal of biomedical optics.

[67]  I. D. Manger,et al.  Microarray Analysis Reveals Previously Unknown Changes in Toxoplasma gondii-infected Human Cells* , 2001, The Journal of Biological Chemistry.

[68]  Ajay N. Jain,et al.  Fully automatic quantification of microarray image data. , 2002, Genome research.

[69]  G Parmigiani,et al.  Protein construct storage: Bayesian variable selection and prediction with mixtures. , 1998, Journal of biopharmaceutical statistics.

[70]  B. Efron,et al.  Combining Possibly Related Estimation Problems , 1973 .

[71]  John D. Storey A direct approach to false discovery rates , 2002 .

[72]  J. S. Hunter,et al.  Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building. , 1979 .

[73]  P. Diggle,et al.  Analysis of Longitudinal Data. , 1997 .

[74]  Alan C. Evans,et al.  A General Statistical Analysis for fMRI Data , 2000, NeuroImage.

[75]  William N. Venables,et al.  S Programming , 2000 .

[76]  Wentian Li,et al.  How Many Genes are Needed for a Discriminant Microarray Data Analysis , 2001, physics/0104029.

[77]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[78]  Maria Persico,et al.  Inducible IL-2 production by dendritic cells revealed by global gene expression analysis , 2001, Nature Immunology.

[79]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[80]  T. Speed,et al.  Design issues for cDNA microarray experiments , 2002, Nature Reviews Genetics.

[81]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[82]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[83]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory, Third Edition , 1989, Springer Series in Information Sciences.

[84]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[85]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[86]  M J van der Laan,et al.  Gene expression analysis with the parametric bootstrap. , 2001, Biostatistics.

[87]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[88]  L. Staudt,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[89]  L. Wasserman,et al.  Operating characteristics and extensions of the false discovery rate procedure , 2002 .

[90]  Joshua M. Stuart,et al.  MICROARRAY EXPERIMENTS : APPLICATION TO SPORULATION TIME SERIES , 1999 .

[91]  G. Churchill,et al.  Experimental design for gene expression microarrays. , 2001, Biostatistics.

[92]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[93]  B. Everitt,et al.  Applied Multivariate Data Analysis. , 1993 .

[94]  S. Gruvberger,et al.  BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data , 2002, Genome Biology.

[95]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[96]  G. Dunteman Principal Components Analysis , 1989 .

[97]  Karl J. Friston,et al.  Statistical parametric maps in functional imaging: A general linear approach , 1994 .

[98]  Brian Everitt,et al.  Cluster analysis , 1974 .

[99]  Nihon Hassei Seibutsu Gakkai,et al.  Genes to cells , 1996 .

[100]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[101]  Simon Lin,et al.  Methods of microarray data analysis III , 2002 .

[102]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[103]  R. Tibshirani,et al.  Flexible Discriminant Analysis by Optimal Scoring , 1994 .

[104]  M. Aldenderfer,et al.  Cluster Analysis. Sage University Paper Series On Quantitative Applications in the Social Sciences 07-044 , 1984 .

[105]  M. Bittner,et al.  Gene expression profiling of alveolar rhabdomyosarcoma with cDNA microarrays. , 1998, Cancer research.

[106]  Mark Schena,et al.  Microarray Biochip Technology , 2000 .

[107]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[108]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[109]  Rolf Sundberg,et al.  Multivariate Calibration — Direct and Indirect Regression Methodology , 1999 .

[110]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[111]  Joachim Mandler HYSTRUC: hydropathy and secondary structure prediction , 1988, Comput. Appl. Biosci..

[112]  Alfonso Valencia,et al.  A hierarchical unsupervised growing neural network for clustering gene expression patterns , 2001, Bioinform..

[113]  Eric R. Ziegel,et al.  Practical Nonparametric and Semiparametric Bayesian Statistics , 1998, Technometrics.

[114]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[115]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[116]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[117]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[118]  D. Lindley,et al.  Bayes Estimates for the Linear Model , 1972 .

[119]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[120]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[121]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[122]  C. Adcock Sample size determination : a review , 1997 .

[123]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[124]  Jean Yee Hwa Yang,et al.  Analysis of CDNA Microarray Images , 2001, Briefings Bioinform..

[125]  Margaret Gardiner-Garden,et al.  A Comparison of Microarray Databases , 2001, Briefings Bioinform..

[126]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[127]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[128]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[129]  L. P. Zhao,et al.  Statistical modeling of large microarray data sets to identify stimulus-response profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[130]  M. Radmacher,et al.  Design of studies using DNA microarrays , 2002, Genetic epidemiology.