Integration of functional information of genes in fuzzy clustering of short time series gene expression data

Recent studies have shown that incorporation of available biological information often leads to biologically more relevant results. Motivated by such studies, we extend template based clustering algorithm to incorporate functional annotation information available for genes. Functional similarities between two genes are calculated based on their annotation in the Gene Ontology (GO) database. To these end three methods of calculating functional similarity are explored. We have measured the correlation between average pairwise similarity score and average membership function values to check the validity of assumption that biologically and functionally related genes are also similar in their expression profiles as well as in their GO functional annotation. We observe that Jiang and Conrath's measure is highly correlated with average membership function value of genes. So we use this method for further analysis. With the incorporation of functional similarity score, we have more choices for the objective function to find out the best clustering of gene expression data. We have performed a comparative study to find the combination of objective functions that leads to more biologically relevant information. We have found that different choices of the objective function lead to different sets of templates, while some common templates are identified by all of them. Based on the aim of the study we suggest either to use all three objectives or to use the two objectives related to functional similarity and quantization error.

[1]  E. Salmon Gene Expression During the Life Cycle of Drosophila melanogaster , 2002 .

[2]  S. Falkow,et al.  Cag pathogenicity island-specific responses of gastric epithelial cells to Helicobacter pylori infection , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[3]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[4]  Satoru Miyano,et al.  Statistical analysis of a small set of time-ordered gene expression data using linear splines , 2002, Bioinform..

[5]  Wei Pan,et al.  Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data , 2006, Bioinform..

[6]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[7]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[8]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[9]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[10]  Kevin Kwong,et al.  Temporal profiling of gene expression during neurogenesis and remodeling in the olfactory epithelium at short intervals after target ablation , 2005, Journal of neuroscience research.

[11]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[12]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[13]  X. Cui,et al.  Statistical tests for differential expression in cDNA microarray experiments , 2003, Genome Biology.

[14]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[15]  Kwang-Hyun Cho,et al.  Microarray data clustering based on temporal variation: FCV with TSD preclustering. , 2003, Applied bioinformatics.

[16]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[17]  George J. Klir,et al.  Fuzzy sets and fuzzy logic - theory and applications , 1995 .

[18]  Eve Syrkin Wurtele,et al.  Identifying differentially expressed genes in unreplicated multiple-treatment microarray timecourse experiments , 2006, Comput. Stat. Data Anal..

[19]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[20]  Ziv Bar-Joseph,et al.  Clustering short time series gene expression data , 2005, ISMB.

[21]  Daniel Hanisch,et al.  Co-clustering of biological networks and gene expression data , 2002, ISMB.

[22]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[23]  Holger Fröhlich,et al.  GOSim – an R-package for computation of information theoretic GO similarities between terms and gene products , 2007, BMC Bioinformatics.

[24]  Wei Pan,et al.  Bioinformatics Original Paper Incorporating Gene Functions as Priors in Model-based Clustering of Microarray Gene Expression Data , 2022 .

[25]  Andreas Zell,et al.  A memetic co-clustering algorithm for gene expression profiles and biological annotation , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[26]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[27]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[28]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[29]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[30]  Kalyanmoy Deb,et al.  A novel fuzzy and multiobjective evolutionary algorithm based gene assignment for clustering short time series expression data , 2007, 2007 IEEE Congress on Evolutionary Computation.

[31]  Michael A. Siani-Rose,et al.  A Knowledge-Based Clustering Algorithm Driven by Gene Ontology , 2004, Journal of biopharmaceutical statistics.