A microarray analysis for differential gene expression in the soybean genome using Bioconductor and R

This article describes specific procedures for conducting quality assessment of Affymetrix GeneChip(R) soybean genome data and for performing analyses to determine differential gene expression using the open-source R programming environment in conjunction with the open-source Bioconductor software. We describe procedures for extracting those Affymetrix probe set IDs related specifically to the soybean genome on the Affymetrix soybean chip and demonstrate the use of exploratory plots including images of raw probe-level data, boxplots, density plots and M versus A plots. RNA degradation and recommended procedures from Affymetrix for quality control are discussed. An appropriate probe-level model provides an excellent quality assessment tool. To demonstrate this, we discuss and display chip pseudo-images of weights, residuals and signed residuals and additional probe-level modeling plots that may be used to identify aberrant chips. The Robust Multichip Averaging (RMA) procedure was used for background correction, normalization and summarization of the AffyBatch probe-level data to obtain expression level data and to discover differentially expressed genes. Examples of boxplots and MA plots are presented for the expression level data. Volcano plots and heatmaps are used to demonstrate the use of (log) fold changes in conjunction with ordinary and moderated t-statistics for determining interesting genes. We show, with real data, how implementation of functions in R and Bioconductor successfully identified differentially expressed genes that may play a role in soybean resistance to a fungal pathogen, Phakopsora pachyrhizi. Complete source code for performing all quality assessment and statistical procedures may be downloaded from our web source: http://css.ncifcrf.gov/services/download/MicroarraySoybean.zip.

[1]  G. Hartman,et al.  Breeding for Resistance to Soybean Rust. , 2005, Plant disease.

[2]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[3]  H. Vanetten,et al.  Introduction of plant and fungal genes into pea (Pisum sativum L.) hairy roots reduces their ability to produce pisatin and affects their response to a fungal pathogen. , 2004, Molecular plant-microbe interactions : MPMI.

[4]  B. Winkel-Shirley,et al.  Flavonoid biosynthesis. A colorful model for genetics, biochemistry, cell biology, and biotechnology. , 2001, Plant physiology.

[5]  S. Clough,et al.  Transcriptome changes in the phenylpropanoid pathway of Glycine max in response to Pseudomonas syringae infection , 2006, BMC Plant Biology.

[6]  Evelyn Camon,et al.  The EMBL Nucleotide Sequence Database , 2000, Nucleic Acids Res..

[7]  Dan Wu,et al.  EMBL Nucleotide Sequence Database in 2006 , 2006, Nucleic Acids Res..

[8]  E. Wit Design and Analysis of DNA Microarray Investigations , 2004, Human Genomics.

[9]  Robert Gentleman,et al.  Distance Measures in DNA Microarray Data Analysis , 2005 .

[10]  Terence P. Speed,et al.  Quality Assessment of Affymetrix GeneChip Data , 2005 .

[11]  N Yabe,et al.  Analysis of tissue-specific expression of Arabidopsis thaliana HSP90-family gene HSP81. , 1994, Plant & cell physiology.

[12]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[13]  C. Williams Applied Multivariate Data Analysis (2nd Edition) , 2002 .

[14]  G. Hartman,et al.  Evaluation of Virulence of Phakopsora pachyrhizi and P. meibomiae Isolates. , 2006, Plant Disease.

[15]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[16]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[17]  R. A. Irizarry,et al.  From CEL Files to Annotated Lists of Interesting Genes , 2005 .

[18]  Katherine S. Pollard,et al.  Cluster Analysis of Genomic Data , 2005 .

[19]  Mihai Aldea,et al.  Expression profiling soybean response to Pseudomonas syringae reveals new defense-related genes and rapid HR-specific downregulation of photosynthesis. , 2005, Molecular plant-microbe interactions : MPMI.

[20]  Arturo Sala,et al.  B-MYB, a transcription factor implicated in regulating cell cycle, apoptosis and cancer. , 2005, European journal of cancer.

[21]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[22]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[23]  Jae K. Lee,et al.  An S-PLUS Library for the Analysis and Visualization of Differential Expression , 2003 .

[24]  Stat Pairs,et al.  Statistical Algorithms Description Document , 2022 .

[25]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[26]  Sorin Drăghici,et al.  Data Analysis Tools for DNA Microarrays , 2003 .

[27]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[28]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[29]  John D. Storey A direct approach to false discovery rates , 2002 .

[30]  Dhammika Amaratunga,et al.  Exploration and Analysis of DNA Microarray and Protein Array Data , 2003, Wiley series in probability and statistics.

[31]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[32]  Beate Sick,et al.  Quality assessment of Affymetrix GeneChip data. , 2006, Omics : a journal of integrative biology.

[33]  Kellie J Archer,et al.  Assessing quality of hybridized RNA in Affymetrix GeneChip experiments using mixed-effects models. , 2005, Biostatistics.

[34]  Benjamin M. Bolstad,et al.  Preprocessing High-density Oligonucleotide Arrays , 2005 .

[35]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[36]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.