A graph-theoretic modeling on GO space for biological interpretation of gene clusters

MOTIVATION With the advent of DNA microarray technologies, the parallel quantification of genome-wide transcriptions has been a great opportunity to systematically understand the complicated biological phenomena. Amidst the enthusiastic investigations into the intricate gene expression data, clustering methods have been the useful tools to uncover the meaningful patterns hidden in those data. The mathematical techniques, however, entirely based on the numerical expression data, do not show biologically relevant information on the clustering results. RESULTS We present a novel methodology for biological interpretation of gene clusters. Our graph theoretic algorithm extracts common biological attributes of the genes within a cluster or a group of interest through the modified structure of gene ontology (GO) called GO tree. After genes are annotated with GO terms, the hierarchical nature of GO terms is used to find the representative biological meanings of the gene clusters. In addition, the biological significance of gene clusters can be assessed quantitatively by defining a distance function on the GO tree. Our approach has a complementary meaning to many statistical clustering techniques; we can see clustering problems from a different viewpoint by use of biological ontology. We applied this algorithm to the well-known data set and successfully obtained the biological features of the gene clusters with the quantitative biological assessment of clustering quality through GO Biological Process.

[1]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[2]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[4]  Mark D. Robinson,et al.  FunSpec: a web-based cluster interpreter for yeast , 2002, BMC Bioinformatics.

[5]  Francis D. Gibbons,et al.  Judging the quality of gene expression-based clustering methods using gene annotation. , 2002, Genome research.

[6]  Zohar Yakhini,et al.  Clustering gene expression patterns , 1999, J. Comput. Biol..

[7]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) , 2002, Nucleic Acids Res..

[8]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[9]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[10]  Peter Bühlmann,et al.  Supervised clustering of genes , 2002, Genome Biology.

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  Steven C. Lawlor,et al.  MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data , 2003, Genome Biology.

[13]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Michael Gribskov,et al.  Use of keyword hierarchies to interpret gene expression patterns , 2001, Bioinform..

[15]  May D. Wang,et al.  GoMiner: a resource for biological interpretation of genomic and proteomic data , 2003, Genome Biology.

[16]  P. Khatri,et al.  Profiling gene expression using onto-express. , 2002, Genomics.

[17]  Francisco Azuaje,et al.  A cluster validity framework for genome expression data , 2002, Bioinform..

[18]  Dieter Jungnickel,et al.  Graphs, Networks, and Algorithms , 1980 .