Unsupervised discovery of fuzzy patterns in gene expression data

Discovering patterns from gene expression levels is regarded as a classification problem when tissue classes of the samples are given and solved as a discrete-data problem by discretizing the expression levels of each gene into intervals maximizing the interdependence between that gene and the class labels. However, when class information is unavailable, discovering gene expression patterns becomes difficult. This paper attempts to tackle this important problem. For a gene pool with large number of genes, we first cluster the genes into smaller groups. In each group, we use the representative gene, one with highest interdependence with others in the group, to drive the discretization of the gene expression levels of other genes. Treating intervals as discrete events, association patterns can be discovered. If the gene groups obtained are crisp clusters, significant patterns overlapping different clusters cannot be found. This paper presents a new method of “fuzzifying” the crisp attribute clusters for that purpose. To evaluate the effectiveness of our approach, we first apply the above described procedure on a synthetic dataset and then a gene expression dataset with known class labels. The class labels are not being used in both analyses but used later as the ground truth in a classificatory problem for assessing the algorithm's effectiveness in fuzzy gene clustering and discretization. The results show the efficacy of the proposed method.

[1]  Yang Wang,et al.  From Association to Classification: Inference Using Weight of Evidence , 2003, IEEE Trans. Knowl. Data Eng..

[2]  Chad Creighton,et al.  Mining gene expression databases for association rules , 2003, Bioinform..

[3]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Andrew K. C. Wong,et al.  Discovering High-Order Patterns of Gene Expression Levels , 2008, J. Comput. Biol..

[5]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Yang Wang,et al.  Pattern discovery: a data driven approach to decision support , 2003, IEEE Trans. Syst. Man Cybern. Part C.

[7]  Gregory Piatetsky-Shapiro,et al.  Capturing best practice for microarray gene expression data analysis , 2003, KDD '03.

[8]  Seraj D. Katebi,et al.  A Fuzzy Approach to Clustering and Selecting Features for Classification of Gene Expression Data , 2008 .

[9]  Andrew K. C. Wong,et al.  Pattern discovery for large mixed-mode database , 2010, CIKM.

[10]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[11]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[12]  W. Pedrycz,et al.  An introduction to fuzzy sets : analysis and design , 1998 .

[13]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[14]  E. Domany Cluster Analysis of Gene Expression Data , 2002, physics/0206056.

[15]  Yang Wang,et al.  A global optimal algorithm for class-dependent discretization of continuous data , 2004, Intell. Data Anal..

[16]  D. Haussler,et al.  Knowledge-based analysis of microarray gene expression , 2000 .

[17]  Kathleen Marchal,et al.  Adaptive quality-based clustering of gene expression profiles , 2002, Bioinform..

[18]  Yang Wang,et al.  Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data , 2005, IEEE ACM Trans. Comput. Biol. Bioinform..

[19]  J. Yen,et al.  Fuzzy Logic: Intelligence, Control, and Information , 1998 .