An interactive approach to mining gene expression data

Effective identification of coexpressed genes and coherent patterns in gene expression data is an important task in bioinformatics research and biomedical applications. Several clustering methods have recently been proposed to identify coexpressed genes that share similar coherent patterns. However, there is no objective standard for groups of coexpressed genes. The interpretation of co-expression heavily depends on domain knowledge. Furthermore, groups of coexpressed genes in gene expression data are often highly connected through a large number of "intermediate" genes. There may be no clear boundaries to separate clusters. Clustering gene expression data also faces the challenges of satisfying biological domain requirements and addressing the high connectivity of the data sets. In this paper, we propose an interactive framework for exploring coherent patterns in gene expression data. A novel coherent pattern index is proposed to give users highly confident indications of the existence of coherent patterns. To derive a coherent pattern index and facilitate clustering, we devise an attraction tree structure that summarizes the coherence information among genes in the data set. We present efficient and scalable algorithms for constructing attraction trees and coherent pattern indices from gene expression data sets. Our experimental results show that our approach is effective in mining gene expression data and is scalable for mining large data sets.

[1]  Philip S. Yu,et al.  /spl delta/-clusters: capturing subspace correlation in a large data set , 2002, Proceedings 18th International Conference on Data Engineering.

[2]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[3]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Roded Sharan,et al.  Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis , 2000, ISMB.

[5]  Paola Sebastiani,et al.  Cluster analysis of gene expression dynamics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Taizo Hanai,et al.  Analysis of expression profile using fuzzy adaptive resonance theory , 2002, Bioinform..

[7]  Haidong Wang,et al.  Discovering molecular pathways from protein interaction and gene expression data , 2003, ISMB.

[8]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[9]  Jian Pei,et al.  DHC: a density-based hierarchical clustering method for time series gene expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[10]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[11]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[12]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Daphne Koller,et al.  Genome-wide discovery of transcriptional modules from DNA sequence and gene expression , 2003, ISMB.

[14]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[15]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[16]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[17]  Ben Shneiderman,et al.  Interactively Exploring Hierarchical Clustering Results , 2002, Computer.

[18]  Philip S. Yu,et al.  MaPle: a fast algorithm for maximal pattern-based clustering , 2003, Third IEEE International Conference on Data Mining.

[19]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[20]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[21]  Jian Pei,et al.  Mining coherent gene clusters from gene-sample-time microarray data , 2004, KDD.

[22]  Jian Pei,et al.  Interactive exploration of coherent patterns in time-series gene expression data , 2003, KDD '03.

[23]  Kathleen Marchal,et al.  Adaptive quality-based clustering of gene expression profiles , 2002, Bioinform..

[24]  Erik D. Demaine,et al.  K-ary Clustering with Optimal Leaf Ordering for Gene Expression Data , 2002, WABI.

[25]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[26]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[27]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[28]  Alfonso Valencia,et al.  A hierarchical unsupervised growing neural network for clustering gene expression patterns , 2001, Bioinform..

[29]  Debashis Ghosh,et al.  Mixture modelling of gene expression data from microarray experiments , 2002, Bioinform..

[30]  Eytan Domany,et al.  Superparamagnetic Clustering of Data , 1996 .

[31]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[32]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[33]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[34]  Ying Xu,et al.  Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees , 2002, Bioinform..

[35]  Terence Hwa,et al.  Percolation Clustering: A Novel Algorithm Applied to the Clustering of Gene Expression Patterns in Dictyostelium Development , 2001, Pacific Symposium on Biocomputing.

[36]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[37]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[38]  Roded Sharan,et al.  CLICK: A Clustering Algorithm for Gene Expression Analysis , 2000, ISMB 2000.

[39]  C. Müller,et al.  Large-scale clustering of cDNA-fingerprinting data. , 1999, Genome research.

[40]  M. Koshiba,et al.  Practical Quantum Cryptography: A Comprehensive Analysis (Part One) , 2000, quant-ph/0009027.

[41]  Wei Wang,et al.  OP-cluster: clustering by tendency in high dimensional space , 2003, Third IEEE International Conference on Data Mining.

[42]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[43]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[44]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Geoffrey J. McLachlan,et al.  A mixture model-based approach to the clustering of microarray expression data , 2002, Bioinform..