Incremental interactive mining of constrained association rules from biological annotation data with nominal features

Data arising from genomic and proteomic experiments is amassing at high speeds resulting in huge amounts of raw data; consequently, the need for analyzing such biological data --- the understanding of which is still lagging way behind --- has been prominently solicited in the post-genomic era we are currently witnessing. In this paper we attempt to analyze annotated genome data by applying a very central data-mining technique known as association rule mining with the aim of discovering rules capable of yielding deeper insights into this type of data. We propose a new technique capable of using domain knowledge in the form of queries in order to efficiently mine only the subset of the associations that are of interest to researcher in an incremental and interactive mode.

[1]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[2]  Bart Goethals,et al.  On Supporting Interactive Association Rule Mining , 2000, DaWaK.

[3]  Gediminas Adomavicius,et al.  Handling very large numbers of association rules in the analysis of microarray data , 2002, KDD.

[4]  William Perrizo,et al.  A Scalable Vertical Model for Mining Association Rules , 2004, J. Inf. Knowl. Manag..

[5]  Devavrat Shah,et al.  Turbo-charging vertical mining of large databases , 2000, SIGMOD '00.

[6]  D D Williams,et al.  Characterization of the Initiation Factor eIF2B and Its Regulation in Drosophila melanogaster * 210 , 2001, The Journal of Biological Chemistry.

[7]  C. Becquet,et al.  Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data , 2002, Genome Biology.

[8]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[9]  Carolina Ruiz,et al.  Distance-enhanced association rules for gene expression , 2003, BIOKDD.

[10]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[11]  Li Yang,et al.  Munich Information Center for Protein Sequences Plant Genome Resources. A Framework for Integrative and Comparative Analyses1[w] , 2005, Plant Physiology.

[12]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[13]  Jean-François Boulicaut,et al.  Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries , 2004, Data Mining and Knowledge Discovery.

[14]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[15]  Amanda Clare,et al.  Data Mining the Yeast Genome in a Lazy Functional Language , 2003, PADL.

[16]  Bart Goethals,et al.  Interactive Constrained Association Rule Mining , 2001, ArXiv.

[17]  Qin Ding,et al.  The P-tree algebra , 2002, SAC '02.