An Association Rule Mining Approach for Co-Regulated Signature genes Identification in Cancer

When a normal cell becomes cancerous there will be change in expression of many genes in that cell. Identification of these changes in gene expression in cancer tissue may lead to the development of novel tools for early diagnosis and effective therapeutics. In this paper we present an association rule mining approach to identify the association between the genes that are differentially expressed in cancer tissue compared to normal tissue. We design an association rule mining algorithm GeneExpMiner for gene expression data mining. Serial Analysis of Gene Expression (SAGE) data related to pancreas cancer is used to demonstrate the approach. It is expected that the approach will help in developing better treatment methodologies for cancer and designing low cost microarray chips for diagnosing cancer. The results have been validated in terms of Gene Ontology and the signature genes that we have identified are matching with the published data.

[1]  Hau-San Wong,et al.  Extracting gene regulation information for cancer classification , 2007, Pattern Recognit..

[2]  De-Shuang Huang,et al.  Regulation probability method for gene selection , 2006, Pattern Recognit. Lett..

[3]  Mohammed J. Zaki,et al.  Efficiently mining maximal frequent itemsets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[4]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[5]  Takagi,et al.  Applying Association Rule Discovery Algorithm to Multipoint Linkage Analysis. , 1997, Genome informatics. Workshop on Genome Informatics.

[6]  Hiroki Arimura,et al.  LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets , 2003, FIMI.

[7]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[8]  Kimberly Walter,et al.  Discovery of novel tumor markers of pancreatic cancer using global gene expression technology. , 2002, The American journal of pathology.

[9]  K. R. Seeja,et al.  A Closed Frequent Itemset Mining Algorithm for Gene Expression Databases , 2008, BCBGC.

[10]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[11]  Gholamhossein Dastghaibyfard,et al.  Parallel Mining of Association Rules from Gene Expression Databases , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[12]  R M Gardiner,et al.  Identification of disease genes. , 1997, British journal of hospital medicine.

[13]  Rainer Spang,et al.  Finding disease specific alterations in the co-expression of genes , 2004, ISMB/ECCB.

[14]  D.H. Glass,et al.  Inferring Adaptive Regulation Thresholds and Association Rules from Gene Expression Data through Combinatorial Optimization Learning , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Arun N. Swami,et al.  Set-oriented mining for association rules in relational databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[16]  Ee-Peng Lim,et al.  A support-ordered trie for fast frequent itemset discovery , 2004, IEEE Transactions on Knowledge and Data Engineering.

[17]  Anthony K. H. Tung,et al.  Carpenter: finding closed patterns in long biological datasets , 2003, KDD '03.

[18]  Filip Karel,et al.  Quantitative association rule mining in genomics using apriori knowledge , 2007 .

[19]  Michael Watson,et al.  CoXpress: differential co-expression in gene expression data , 2006, BMC Bioinformatics.

[20]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[21]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..

[22]  C. Becquet,et al.  Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data , 2002, Genome Biology.

[23]  Liang Chen,et al.  A statistical method for identifying differential gene-gene co-expression patterns , 2004, Bioinform..

[24]  Le Gruenwald,et al.  Microarray gene expression data association rules mining based on JG-Tree , 2003, 14th International Workshop on Database and Expert Systems Applications, 2003. Proceedings..

[25]  De-Shuang Huang,et al.  A gene selection algorithm based on the gene regulation probability using maximal likelihood estimation , 2005, Biotechnology Letters.

[26]  Ji Huang,et al.  [Serial analysis of gene expression]. , 2002, Yi chuan = Hereditas.

[27]  R. Bals,et al.  Identification of disease genes by expression profiling. , 2001, The European respiratory journal.

[28]  Chad Creighton,et al.  Mining gene expression databases for association rules , 2003, Bioinform..

[29]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[30]  K. Kinzler,et al.  Serial Analysis of Gene Expression , 1995, Science.

[31]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[32]  Karuturi R. Krishna Murthy,et al.  Significance Analysis and Improved Discovery of Differentially Co-expressed Gene Sets in Microarray Data , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[33]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[34]  Anthony K. H. Tung,et al.  FARMER: finding interesting rule groups in microarray datasets , 2004, SIGMOD '04.

[35]  Carolina Ruiz,et al.  Distance-enhanced association rules for gene expression , 2003, BIOKDD.

[36]  Carolina Ruiz,et al.  Hypothesis-Driven Specialization of Gene Expression Association Rules , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[37]  Yungho Leu,et al.  An effective Boolean algorithm for mining association rules in large databases , 1999, Proceedings. 6th International Conference on Advanced Systems for Advanced Applications.