Ontology-Driven Co-clustering of Gene Expression Data

The huge volume of gene expression data produced by microarrays and other high-throughput techniques has encouraged the development of new computational techniques to evaluate the data and to formulate new biological hypotheses. To this purpose, co-clustering techniques are widely used: these identify groups of genes that show similar activity patterns under a specific subset of the experimental conditions by measuring the similarity in expression within these groups. However, in many applications, distance metrics based only on expression levels fail in capturing biologically meaningful clusters. We propose a methodology in which a standard expression-based co-clustering algorithm is enhanced by sets of constraints which take into account the similarity/dissimilarity (inferred by the Gene Ontology, GO) between pairs of genes. Our approach minimizes the intervention of the analyst within the co-clustering process. It provides meaningful co-clusters whose discovery and interpretation is increased by embedding GO annotations.

[1]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[2]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[3]  Carsten Wiuf,et al.  Co-clustering and visualization of gene expression data and gene ontology terms for Saccharomyces cerevisiae using self-organizing maps , 2007, J. Biomed. Informatics.

[4]  Inderjit S. Dhillon,et al.  Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data , 2004, SDM.

[5]  Daniel Hanisch,et al.  Co-clustering of biological networks and gene expression data , 2002, ISMB.

[6]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[7]  Joachim Selbig,et al.  Hypothesis-driven approach to predict transcriptional units from gene expression data , 2004, Bioinform..

[8]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[10]  Ruggero G. Pensa,et al.  Constrained Co-clustering of Gene Expression Data , 2008, SDM.

[11]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Marco Botta,et al.  A new protein motif extraction framework based on constrained co-clustering , 2009, SAC '09.