Exploring statistical methods for analysis of microarray data

The expansion of molecular biology in recent years has created an increasing amount of data and interest in specific tools to analyze them. Much of these data come from a class of high-throughput technology that measures hundreds or thousands of variables at the same time. One such high-throughput technology currently in use is microarray technology. The three major objectives in expression analysis are data preprocessing, identifying differential expression, and grouping genes by common behavior. Extracting the useful information on gene expression from the available output is not trivial. The data collection process is quite noisy in that non-biological bias may be introduced at a number of points by the operators or the technology. Identifying differential expression is an important step in reducing the number of variables, p, of interest to a reasonable scale. It requires distinguishing random variation in expression measurements from signal of interest. Most statistical research so far has focused on this problem and many methods exist for making the determination. Finally, grouping genes has biological importance in identifying the purpose of unidentified genes and the interconnections between biological systems. We focus on achieving the first and last of these objectives while using relatively standard methods for the second one.

[1]  T. Thomas,et al.  Gene-expression programs in embryogenic and non-embryogenic carrot cultures , 1988, Planta.

[2]  M. Toonen,et al.  A leucine-rich repeat containing receptor-like kinase marks somatic plant cells competent to form embryos. , 1997, Development.

[3]  U. Grossniklaus,et al.  The Arabidopsis Somatic Embryogenesis Receptor Kinase 1 Gene Is Expressed in Developing Ovules and Embryos and Enhances Embryogenic Competence in Culture , 2001 .

[4]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[5]  M. Delseny,et al.  Changes in gene expression in the leafy cotyledon1 (lec1) and fusca3 (fus3) mutants of Arabidopsis thaliana L. , 2000, Journal of experimental botany.

[6]  C. Armstrong,et al.  Development and availability of germplasm with high Type II culture formation response , 1991 .

[7]  C. Chu,et al.  ESTABLISHMENT OF AN EFFICIENT MEDIUM FOR ANTHER CULTURE OF RICE THROUGH COMPARATIVE EXPERIMENTS ON THE NITROGEN SOURCES , 1975 .

[8]  M. West,et al.  LEAFY COTYLEDON1 Is an Essential Regulator of Late Embryogenesis and Cotyledon Identity in Arabidopsis. , 1994, The Plant cell.

[9]  J. Glasner,et al.  Genome-wide expression profiling in Escherichia coli K-12. , 1999, Nucleic acids research.

[10]  R. Phillips,et al.  Plant Regeneration from Tissue Cultures of Maize 1 , 1975 .

[11]  P. Lemaux,et al.  Transformation of Maize Cells and Regeneration of Fertile Transgenic Plants. , 1990, The Plant cell.

[12]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[13]  Z. R. Sung,et al.  Embryonic proteins in somatic embryos of carrot. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Kan Wang,et al.  Gene Expression Patterns During Somatic Embryo Development and Germination in Maize Hi II Callus Cultures , 2006, Plant Molecular Biology.

[15]  C. L. Armstrong,et al.  Establishment and maintenance of friable, embryogenic maize callus and the involvement of L-proline , 1985, Planta.

[16]  T. Speed,et al.  A multivariate empirical Bayes statistic for replicated microarray time course data , 2006, math/0702685.

[17]  P. Khurana,et al.  Gene expression during somatic embryogenesis - recent advances , 2002 .

[18]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[19]  J. Ibrahim,et al.  Bayesian Models for Gene Expression With DNA Microarray Data , 2002 .

[20]  L. Liu,et al.  Cloning of genes developmentally regulated during plant embryogenesis. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Robert B Goldberg,et al.  Arabidopsis LEAFY COTYLEDON1 Is Sufficient to Induce Embryo Development in Vegetative Cells , 1998, Cell.

[22]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[23]  D. Meinke,et al.  Leafy Cotyledon Mutants of Arabidopsis. , 1994, The Plant cell.

[24]  C. Dumas,et al.  Molecular characterisation of two novel maize LRR receptor-like kinases, which belong to the SERK gene family , 2001, Planta.

[25]  Sylvia Richardson,et al.  Bayesian Hierarchical Model for Identifying Changes in Gene Expression from Microarray Experiments , 2002, J. Comput. Biol..

[26]  N. Lee,et al.  A concise guide to cDNA microarray analysis. , 2000, BioTechniques.

[27]  Prioli,et al.  Plant Regeneration and Recovery of Fertile Plants from Protoplasts of Maize (Zea Mays L.) , 1989, Bio/Technology.

[28]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[29]  P. Suprasanna,et al.  Biochemical changes in embryogenic and non-embryogenic calli of Zea mays L. , 1990 .

[30]  B. Larkins,et al.  Changes in the zein composition of protein bodies during maize endosperm development. , 1989, The Plant cell.

[31]  D. Meinke A Homoeotic Mutant of Arabidopsis thaliana with Leafy Cotyledons , 1992, Science.

[32]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[33]  H. Nielsen,et al.  Caleosins: Ca2+-binding proteins associated with lipid bodies , 2000, Plant Molecular Biology.

[34]  C M Kendziorski,et al.  On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles , 2003, Statistics in medicine.

[35]  Kevin Dobbin,et al.  Comparison of microarray designs for class comparison and class discovery , 2002, Bioinform..

[36]  Adrian E. Raftery,et al.  MCLUST: Software for Model-Based Cluster Analysis , 1999 .

[37]  G. Churchill,et al.  Experimental design for gene expression microarrays. , 2001, Biostatistics.

[38]  G. Carswell,et al.  Regeneration of Fertile Plants from Protoplasts of Elite Inbread Maize. , 1989, Bio/Technology.

[39]  F. Skoog,et al.  A revised medium for the growth and bioassay with tobacco tissue culture , 1962 .

[40]  John Aach,et al.  Measuring absolute expression with microarrays with a calibrated reference sample and an extended signal intensity range , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[41]  F. Thibaud-Nissen,et al.  Clustering of Microarray Data Reveals Transcript Patterns Associated with Somatic Embryogenesis in Soybean1,212 , 2003, Plant Physiology.

[42]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[43]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[44]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[45]  Jie Liang,et al.  Computational analysis of microarray gene expression profiles: clustering, classification, and beyond , 2002 .

[46]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[47]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[48]  Debashis Ghosh,et al.  STATISTICAL ISSUES IN THE CLUSTERING OF GENE EXPRESSION DATA , 2001 .

[49]  N. Ruijter,et al.  Isozymes as biochemical and cytochemical markers in embryogenic callus cultures of maize (Zea mays L.) , 1989, Plant Cell Reports.

[50]  T. Speed,et al.  Statistical issues in cDNA microarray data analysis. , 2003, Methods in molecular biology.

[51]  Chiara Romualdi,et al.  Improved detection of differentially expressed genes in microarray experiments through multiple scanning and image integration. , 2003, Nucleic acids research.

[52]  T. Hodges,et al.  Anatomy of Somatic Embryos from Maize Embryo Cultures , 1986, Botanical Gazette.

[53]  F. Parcy,et al.  The ABSCISIC ACID-INSENSITIVE3, FUSCA3, and LEAFY COTYLEDON1 loci act in concert to control multiple aspects of Arabidopsis seed development. , 1997, The Plant cell.

[54]  P. Chomczyński,et al.  Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction. , 1987, Analytical biochemistry.

[55]  Shibo Zhang,et al.  Similarity of expression patterns of knotted1 and ZmLEC1 during somatic and zygotic embryogenesis in maize (Zea mays L.) , 2002, Planta.

[56]  Oswaldo Trelles,et al.  Saturation and Quantization Reduction in Microarray Experiments using Two Scans at Different Sensitivities , 2004, Statistical applications in genetics and molecular biology.

[57]  Mark J. van der Laan,et al.  Paired and Unpaired Comparisons and Clustering with Gene Expression Data , 2001 .

[58]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[59]  E. Hovig,et al.  Profound influence of microarray scanner characteristics on gene expression ratios: analysis and procedure for correction , 2004, BMC Genomics.

[60]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[61]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..