15 THE USE OF GO TERMS TO UNDERSTAND THE BIOLOGICAL SIGNIFICANCE OF MICROARRAY DIFFERENTIAL GENE EXPRESSION DATA

We show one way of using Gene Ontology (GO) to understand the biological relevance of statistical differences in gene expression data from microarray experiments. To illustrate our methodology we use the data from Pritchard et al. [2001]. Our approach involves three sequential steps: 1) analyze the data to sort genes according to how much they differ between/among organs using a linear model; 2) divide the genes based on ``how much or how strongly'' they differ, separating those more expressed in one organ vs. those more expressed in the other organ; 3) examine the relative frequency of GO terms in the two groups, using Fisher's exact test, with correction for multiple testing, to assess which of the GO terms differ significantly between the groups of genes. We repeat steps 2) and 3) using a sliding window that covers all the sorted genes, so that we successively compare each group of genes against all others. By using the GO terms, we obtain biological information about the predominant biological processes or molecular functions of the genes that are differentially expressed between organs, making it easier to evaluate the biological relevance of inter-organ differences in the expression of sets of genes. Moreover, when applied to novel situations (e.g., comparing different cancer conditions), this method can provide important hints about the biologically relevant aspects and characteristics of the differences between conditions. Finally, the proposed method is easily applied.

[1]  Dallas E. Johnson,et al.  Analysis of messy data , 1992 .

[2]  Yogendra P. Chaubey Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[3]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[4]  D. Botstein,et al.  Exploring the new world of the genome with DNA microarrays , 1999, Nature Genetics.

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  G. Churchill,et al.  Experimental design for gene expression microarrays. , 2001, Biostatistics.

[7]  G. Churchill,et al.  Statistical design and the analysis of gene expression microarray data. , 2001, Genetical research.

[8]  Y. Benjamini,et al.  Controlling the false discovery rate in behavior genetics research , 2001, Behavioural Brain Research.

[9]  P. Nelson,et al.  Project normal: Defining normal variance in mouse gene expression , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Pierre R. Bushel,et al.  Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models , 2001, J. Comput. Biol..

[11]  Francis D. Gibbons,et al.  Judging the quality of gene expression-based clustering methods using gene annotation. , 2002, Genome research.

[12]  W. Wong,et al.  Transitive functional annotation by shortest-path analysis of gene expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Joaquín Dopazo,et al.  Unsupervised reduction of the dimensionality followed by supervised learning with a perceptron improves the classification of conditions in DNA microarray gene expression data , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[14]  William Stafford Noble,et al.  Exploring Gene Expression Data with Class Scores , 2001, Pacific Symposium on Biocomputing.

[15]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[16]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .