High Confidence Rule Mining for Microarray Analysis

We present an association rule mining method for mining high-confidence rules, which describe interesting gene relationships from microarray data sets. Microarray data sets typically contain an order of magnitude more genes than experiments, rendering many data mining methods impractical as they are optimized for sparse data sets. A new family of row-enumeration rule mining algorithms has emerged to facilitate mining in dense data sets. These algorithms rely on pruning infrequent relationships to reduce the search space by using the support measure. This major shortcoming results in the pruning of many potentially interesting rules with low support but high confidence. We propose a new row-enumeration rule mining method, MaxConf, to mine high-confidence rules from microarray data. MAXCONF is a support-free algorithm that directly uses the confidence measure to effectively prune the search space. Experiments on three microarray data sets show that MaxConf outperforms support-based rule mining with respect to scalability and rule extraction. Furthermore, detailed biological analyses demonstrate the effectiveness of our approach-the rules discovered by MaxConf are substantially more interesting and meaningful compared with support-based methods.

[1]  Anthony K. H. Tung,et al.  Mining frequent closed patterns in microarray data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[2]  T. Hughes,et al.  Exploration of Essential Gene Functions via Titratable Promoter Alleles , 2004, Cell.

[3]  Satoru Miyano,et al.  Identification of genetic networks by strategic gene disruptions and gene overexpressions under a boolean model , 2003, Theor. Comput. Sci..

[4]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[5]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[6]  A. Romeo,et al.  Regulation of High Affinity Iron Uptake in the YeastSaccharomyces cerevisiae , 1998, The Journal of Biological Chemistry.

[7]  Anthony K. H. Tung,et al.  Mining top-K covering rule groups for gene expression data , 2005, SIGMOD '05.

[8]  Ian M. Donaldson,et al.  The Biomolecular Interaction Network Database and related tools 2005 update , 2004, Nucleic Acids Res..

[9]  Anthony K. H. Tung,et al.  FARMER: finding interesting rule groups in microarray datasets , 2004, SIGMOD '04.

[10]  Sanjay Chawla,et al.  On discovery of maximal confident rules without support pruning in microarray data , 2005, BIOKDD.

[11]  Satoru Miyano,et al.  Identification of Genetic Networks from a Small Number of Gene Expression Patterns Under the Boolean Network Model , 1998, Pacific Symposium on Biocomputing.

[12]  D. Winge,et al.  Metalloregulation of FRE1 and FRE2Homologs in Saccharomyces cerevisiae * , 1998, The Journal of Biological Chemistry.

[13]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[14]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[15]  Anthony K. H. Tung,et al.  Carpenter: finding closed patterns in long biological datasets , 2003, KDD '03.

[16]  Satoru Miyano,et al.  Inferring qualitative relations in genetic networks and metabolic pathways , 2000, Bioinform..

[17]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[18]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[19]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[20]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[21]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[22]  Jian Pei,et al.  Towards data mining benchmarking: a test bed for performance study of frequent pattern mining , 2000, SIGMOD 2000.

[23]  Hui Xiong,et al.  Mining confident co-location rules without a support threshold , 2003, SAC '03.

[24]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Association Rule Mining , 2007 .

[25]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[26]  H. Boucherie,et al.  The Snf1 Protein Kinase Controls the Induction of Genes of the Iron Uptake Pathway at the Diauxic Shift in Saccharomyces cerevisiae* , 2003, Journal of Biological Chemistry.

[27]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[28]  Chad Creighton,et al.  Mining gene expression databases for association rules , 2003, Bioinform..

[29]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.