Incorporating Biological Knowledge into Density-Based Clustering Analysis of Gene Expression Data

It has been observed that genes with the same function or involved in the same biological process are likely to co-express, hence clustering gene expression profiles provide a means for gene function prediction. Most existing clustering methods ignore known gene functions in the process of clustering, and also get the analysis results lacking of stability and biological interpretability. To make full use of the accumulating gene function annotations, we propose using the density information of genes and known biological knowledge through the density based algorithms, which can get a better clustering result than the traditional clustering algorithms. An application to two real datasets demonstrates the advantage of our proposal over the standard method.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  Wei Pan,et al.  Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data , 2006, Bioinform..

[3]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[4]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[5]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[6]  V.S. Tseng,et al.  Efficiently mining gene expression data via a novel parameterless clustering method , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Roger E Bumgarner,et al.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. , 2001, Science.

[8]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[9]  Minoru Kanehisa,et al.  Toward Pathway Engineering: A New Database of Genetic and Molecular Pathways , 1997 .

[10]  Lei Liu,et al.  Knowledge guided analysis of microarray data , 2006, J. Biomed. Informatics.

[11]  Hongyu Zhao,et al.  Assessing reliability of gene clusters from gene expression data , 2000, Functional & Integrative Genomics.

[12]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[13]  Ron Shamir,et al.  CLICK and EXPANDER: a system for clustering and visualizing gene expression data , 2003, Bioinform..

[14]  Anindya Bhattacharya,et al.  Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles , 2008, Bioinform..

[15]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[16]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.