High Confidence Rule Mining for Microarray Analysis

We present an association rule mining method for mining high confidence rules, which describe interesting gene relationships from microarray datasets. Microarray datasets typically contain an order of magnitude more genes than experiments, rendering many data mining methods impractical as they are optimised for sparse datasets. A new family of row-enumeration rule mining algorithms have emerged to facilitate mining in dense datasets. These algorithms rely on pruning infrequent relationships to reduce the search space by using the support measure. This major shortcoming results in the pruning of many potentially interesting rules with low support but high confidence. We propose a new row-enumeration rule mining method, MAXCONF, to mine high confidence rules from microarray data. MAXCONF is a support-free algorithm which directly uses the confidence measure to effectively prune the search space. Experiments on three microarray datasets show that MAXCONF outperforms support-based rule mining with respect to scalability and rule extraction. Furthermore, detailed biological analyses demonstrate the effectiveness of our approach – the rules discovered by MAXCONF are substantially more interesting and meaningful compared with support-based methods.

[1]  Anthony K. H. Tung,et al.  FARMER: finding interesting rule groups in microarray datasets , 2004, SIGMOD '04.

[2]  A. Romeo,et al.  Regulation of High Affinity Iron Uptake in the YeastSaccharomyces cerevisiae , 1998, The Journal of Biological Chemistry.

[3]  Anthony K. H. Tung,et al.  Mining top-K covering rule groups for gene expression data , 2005, SIGMOD '05.

[4]  Satoru Miyano,et al.  Identification of genetic networks by strategic gene disruptions and gene overexpressions under a boolean model , 2003, Theor. Comput. Sci..

[5]  Satoru Miyano,et al.  Inferring qualitative relations in genetic networks and metabolic pathways , 2000, Bioinform..

[6]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[7]  Hui Xiong,et al.  Mining confident co-location rules without a support threshold , 2003, SAC '03.

[8]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[9]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[10]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Association Rule Mining , 2007 .

[11]  Ian M. Donaldson,et al.  The Biomolecular Interaction Network Database and related tools 2005 update , 2004, Nucleic Acids Res..

[12]  Anthony K. H. Tung,et al.  Carpenter: finding closed patterns in long biological datasets , 2003, KDD '03.

[13]  H. Boucherie,et al.  The Snf1 Protein Kinase Controls the Induction of Genes of the Iron Uptake Pathway at the Diauxic Shift in Saccharomyces cerevisiae* , 2003, Journal of Biological Chemistry.

[14]  Chad Creighton,et al.  Mining gene expression databases for association rules , 2003, Bioinform..

[15]  Satoru Miyano,et al.  Identification of Genetic Networks from a Small Number of Gene Expression Patterns Under the Boolean Network Model , 1998, Pacific Symposium on Biocomputing.

[16]  Anthony K. H. Tung,et al.  Mining frequent closed patterns in microarray data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[17]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[18]  GusfieldDan Introduction to the IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2004 .

[19]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[20]  Sanjay Chawla,et al.  On discovery of maximal confident rules without support pruning in microarray data , 2005, BIOKDD.

[21]  D. Winge,et al.  Metalloregulation of FRE1 and FRE2Homologs in Saccharomyces cerevisiae * , 1998, The Journal of Biological Chemistry.

[22]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[23]  T. Hughes,et al.  Exploration of Essential Gene Functions via Titratable Promoter Alleles , 2004, Cell.

[24]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .