论文信息 - Mining cross-graph quasi-cliques in gene expression and protein interaction data

Mining cross-graph quasi-cliques in gene expression and protein interaction data

A protein is the product of a gene. From the gene expression data, we can find co-expressed genes, which are groups of genes that demonstrate coherent patterns on samples. On the other hand, from the protein interaction data, we can find groups of proteins that frequently interact with each other. If we can conduct a joint mining of both gene expression data and protein interaction data, then we may find the clusters of genes that are co-expressed and also their proteins interact. Such clusters found from the joint mining are interesting and meaningful for at least two reasons. First, both the gene expression data and the protein data are very noisy. The clusters confirmed by both data sets will strongly indicate the correlation/connection among the genes in a cluster. In other words, the clusters found from the joint mining are more reliable. We may thus have the high confidence that the genes in a cluster found as such are regulated by the same mechanism or belong to the same biological process. Second, although highly related, gene expression data and protein interaction data still carry different biological meaning. The coincidence of co-expressed genes and interacting proteins is biologically significant. As indicated in [5], many pathways exhibit two properties: their genes exhibit a similar gene expression profile, and the protein products of the genes often interact.

Jian Pei | Aidong Zhang | Daxin Jiang

[1] Forouzan Golshani,et al. Proceedings of the Eighth International Conference on Data Engineering , 1992 .

[2] Haidong Wang,et al. Discovering molecular pathways from protein interaction and gene expression data , 2003, ISMB.

[3] David Page,et al. Biological applications of multi-relational data mining , 2003, SKDD.

[4] Jian Pei,et al. On mining cross-graph quasi-cliques , 2005, KDD '05.

[5] George M. Church,et al. Biclustering of Expression Data , 2000, ISMB.

[6] Raj Acharya,et al. An information theoretic approach for analyzing temporal patterns of gene expression , 2003, Bioinform..

[7] Philip S. Yu,et al. Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.