Unsupervised Identifying Diagnostic Genes and Specific Phenotypes from Microarray Data

In this paper, we explore a new problem of simultaneously mining diagnostic genes and specific phenotypes from microarray data using unsupervised method. A novel type of cluster called LC-Cluster is proposed to address this problem. The idea behind the solution is motivated by recent biological discovery and origins from current bicluster model or emerging pattern, but differs substantially from either of them. We also design two efficient tree-based algorithms, namely FALCONER and E-FALCONER, to mine all such maximal clusters. Extensive experiments conducted on both several real and synthetic datasets show: (1) our approaches are efficient and effective, (2) our approaches outperform the existing enumeration tree-based algorithm, and (3) our approaches can discover an amount of LC-Clusters, which are potentially of high biological significance