Discovering relational-based association rules with multiple minimum supports on microarray datasets

MOTIVATION Association rule analysis methods are important techniques applied to gene expression data for finding expression relationships between genes. However, previous methods implicitly assume that all genes have similar importance, or they ignore the individual importance of each gene. The relation intensity between any two items has never been taken into consideration. Therefore, we proposed a technique named REMMAR (RElational-based Multiple Minimum supports Association Rules) algorithm to tackle this problem. This method adjusts the minimum relation support (MRS) for each gene pair depending on the regulatory relation intensity to discover more important association rules with stronger biological meaning. RESULTS In the actual case study of this research, REMMAR utilized the shortest distance between any two genes in the Saccharomyces cerevisiae gene regulatory network (GRN) as the relation intensity to discover the association rules from two S.cerevisiae gene expression datasets. Under experimental evaluation, REMMAR can generate more rules with stronger relation intensity, and filter out rules without biological meaning in the protein-protein interaction network (PPIN). Furthermore, the proposed method has a higher precision (100%) than the precision of reference Apriori method (87.5%) for the discovered rules use a literature survey. Therefore, the proposed REMMAR algorithm can discover stronger association rules in biological relationships dissimilated by traditional methods to assist biologists in complicated genetic exploration.

[1]  Raghu Ramakrishnan,et al.  Proceedings : KDD 2000 : the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 20-23, 2000, Boston, MA, USA , 2000 .

[2]  D. Tollervey,et al.  Ribosome synthesis in Saccharomyces cerevisiae. , 1999, Annual review of genetics.

[3]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Philip S. Yu,et al.  MaPle: a fast algorithm for maximal pattern-based clustering , 2003, Third IEEE International Conference on Data Mining.

[5]  T. Mcintosh,et al.  High Confidence Rule Mining for Microarray Analysis , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Wynne Hsu,et al.  Mining association rules with multiple minimum supports , 1999, KDD '99.

[7]  Vincent S. Tseng,et al.  Effective Ranking and Recommendation on Web Page Retrieval by Integrating Association Mining and PageRank , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[8]  L. Kruglyak,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002, Science.

[9]  Michael Wigler,et al.  Three different genes in S. cerevisiae encode the catalytic subunits of the cAMP-dependent protein kinase , 1987, Cell.

[10]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[11]  D. Ramkumar,et al.  Weighted Association Rules: Model and Algorithm , 1998 .

[12]  John J. Leggett,et al.  WFIM: Weighted Frequent Itemset Mining with a weight range and a minimum weight , 2005, SDM.

[13]  Ricardo Martínez,et al.  GenMiner: mining non-redundant association rules from integrated gene expression data and annotations , 2008, Bioinform..

[14]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[15]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[16]  J. Shabanowitz,et al.  A large nucleolar U3 ribonucleoprotein required for 18S ribosomal RNA biogenesis , 2002, Nature.

[17]  Chun-Gui Xu,et al.  A genetic programming-based approach to the classification of multiclass microarray datasets , 2009, Bioinform..

[18]  Ian Witten,et al.  Data Mining , 2000 .

[19]  Jesús S. Aguilar-Ruiz,et al.  Gene association analysis: a survey of frequent pattern mining from gene expression data , 2010, Briefings Bioinform..

[20]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[21]  Francesca Martella,et al.  Classification of microarray data with factor mixture models , 2006, Bioinform..

[22]  Philip S. Yu,et al.  UP-Growth: an efficient algorithm for high utility itemset mining , 2010, KDD.

[23]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[24]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[25]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[26]  X. Bustelo,et al.  Functional Characterization of Pwp2, a WD Family Protein Essential for the Assembly of the 90 S Pre-ribosomal Particle*♦ , 2004, Journal of Biological Chemistry.

[27]  Gajendra P. S. Raghava,et al.  SVM based method for predicting HLA-DRB1*0401 binding peptides in an antigen sequence , 2004, Bioinform..

[28]  Philip S. Yu,et al.  Efficient mining of weighted association rules (WAR) , 2000, KDD '00.

[29]  Stefan Kramer,et al.  Analyzing microarray data using quantitative association rules , 2005, ECCB/JBI.

[30]  Anbupalam Thalamuthu,et al.  Gene expression Evaluation and comparison of gene clustering methods in microarray analysis , 2006 .

[31]  Ada Wai-Chee Fu,et al.  Mining association rules with weighted items , 1998, Proceedings. IDEAS'98. International Database Engineering and Applications Symposium (Cat. No.98EX156).

[32]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[33]  L S Robertson,et al.  The yeast A kinases differentially regulate iron uptake and respiratory function. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Chad Creighton,et al.  Mining gene expression databases for association rules , 2003, Bioinform..

[35]  Bernhard Kuster,et al.  90S pre-ribosomes include the 35S pre-rRNA, the U3 snoRNP, and 40S subunit processing factors but predominantly lack 60S synthesis factors. , 2002, Molecular cell.

[36]  Alexander Wlodawer,et al.  The Potency and Specificity of the Interaction between the IA3 Inhibitor and Its Target Aspartic Proteinase fromSaccharomyces cerevisiae * , 2001, The Journal of Biological Chemistry.

[37]  Fionn Murtagh,et al.  Weighted Association Rule Mining using weighted support and significance framework , 2003, KDD '03.