Microarray gene expression data association rules mining based on BSC-tree and FIS-tree

In this paper we propose to use association rules to mine the association relationships among different genes under the same experimental conditions. These kinds of relations may also exist across many different experiments with various experimental conditions. In this paper, a new approach, called FIS-tree mining, is proposed for mining the microarray data. Our approach uses two new data structures, BSC-tree and FIS-tree, and a data partition format for gene expression level data. Based on these two new data structures it is possible to mine the association rules efficiently and quickly from the gene expression database. Our algorithm was tested using the two real-life gene expression databases available at Stanford University and Harvard Medical School and was shown to perform better than the two existing algorithms, Apriori and FP-Growth.

[1]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[2]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[3]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[5]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[6]  Ulrich Güntzer,et al.  Mining Association Rules: Deriving a Superior Algorithm by Analyzing Today's Approaches , 2000, PKDD.

[7]  L. Penland,et al.  Use of a cDNA microarray to analyse gene expression patterns in human cancer , 1996, Nature Genetics.

[8]  Walter L. Ruzzo,et al.  Bayesian Classification of DNA Array Expression Data , 2000 .

[9]  A. Brazma,et al.  Gene expression data analysis , 2000, FEBS letters.

[10]  James C. Bezdek,et al.  An integrated approach to fuzzy learning vector quantization and fuzzy c-means clustering , 1997, IEEE Trans. Fuzzy Syst..

[11]  Qiang Ding,et al.  Deriving High Confidence Rules from Spatial Data Using Peano Count Trees , 2001, WAIM.

[12]  Qiang Ding,et al.  On Mining Satellite and other Remotely Sensed Images , 2001, DMKD.

[13]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[14]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[15]  Tsung-Shan Tsou,et al.  A data mining method to predict transcriptional regulatory sites based on differentially expressed genes in human genome , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[16]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[17]  Hanan Samet,et al.  The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[18]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[19]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[20]  Sanjay Ranka,et al.  An effic ient k-means clustering algorithm , 1997 .

[21]  P. Brown,et al.  A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. , 1996, Genome research.

[22]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[23]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[24]  James L. Winkler,et al.  Accessing Genetic Information with High-Density DNA Arrays , 1996, Science.

[25]  R. Sharan,et al.  CLICK: a clustering algorithm with applications to gene expression analysis. , 2000, Proceedings. International Conference on Intelligent Systems for Molecular Biology.

[26]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[27]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[28]  Roded Sharan,et al.  Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis , 2000, ISMB.

[29]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.