Rule based functional description of genes – Estimation of the multicriteria rule interestingness measure by the UTA method

Abstract In this paper we present new extension of RuleGO rule generation method. The method was designed to discover logical rules including combination of GO terms in their premises in order to provide functional description of analyzed gene signatures. As the number of obtained rules is typically huge, filtration algorithm is required to select only the most interesting ones. Rule interestingness measures currently used within the RuleGO method do not always allow for the selection of the rules according to user's subjective preferences. In this paper we propose an application of the UTA method for estimation of the multicriteria rule interestingness measure reflecting expert's subjective rule evaluation. In the presented method, each of the rules is characterized by a vector of values reflecting its quality due to the different parial interestingness measures. From the designated set of rules a set of representative rules is selected and presented to an expert who orders the rules based on his preferences. Using the information about the order and values of the partial interestingness measures, the additive multicriteria interestingness measure is estimated. The measure is estimated in such a way that the rule ranking obtained by this function is consistent with the ranking given by an expert. The presented approach is applied to three microarray data sets and obtained rule orders are compared with rule orders generated with the standard RuleGO rule evaluation method. Presented method allows obtaining the rule ranking that is better correlated with expert ranking than the ranking obtained in the standard way.

[1]  Herman Midelfart Supervised Learning in the Gene Ontology Part II: A Bottom-Up Algorithm , 2005, Trans. Rough Sets.

[2]  Branko Kavsek,et al.  APRIORI-SD: ADAPTING ASSOCIATION RULE LEARNING TO SUBGROUP DISCOVERY , 2006, IDA.

[3]  Fabrice Guillet,et al.  Quality Measures in Data Mining (Studies in Computational Intelligence) , 2007 .

[4]  Nick Cercone,et al.  Rule Quality Measures for Rule Induction Systems: Description and Evaluation , 2001, Comput. Intell..

[5]  J. Siskos Assessing a set of additive utility functions for multicriteria decision-making , 1982 .

[6]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[8]  Ronald W. Davis,et al.  Transcriptional regulation and function during the human cell cycle , 2001, Nature Genetics.

[9]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[10]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[11]  J. Stefanowski,et al.  Induction of decision rules in classification and discovery‐oriented perspectives , 2001 .

[12]  Kenneth McGarry,et al.  A survey of interestingness measures for knowledge discovery , 2005, The Knowledge Engineering Review.

[13]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[14]  Jan Komorowski,et al.  Taming Large Rule Models in Rough Set Approaches , 1999, PKDD.

[15]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[16]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[17]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[18]  Marek Sikora,et al.  Quality improvement of rule-based gene group descriptions using information about GO terms importance occurring in premises of determined rules , 2010, Int. J. Appl. Math. Comput. Sci..

[19]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[20]  Michael Hackenberg,et al.  Annotation-Modules: a tool for finding significant combinations of multisource annotations for gene lists , 2008, Bioinform..

[21]  Marek Sikora,et al.  RuleGO: a logical rules-based tool for description of gene groups by means of Gene Ontology , 2011, Nucleic Acids Res..

[22]  Urszula Stanczyk,et al.  Decision rule length as a basis for evaluation of attribute relevance , 2013, J. Intell. Fuzzy Syst..

[23]  Shusaku Tsumoto,et al.  Evaluating Learning Algorithms to Construct Rule Evaluation Models Based on Objective Rule Evaluation Indices , 2007, 6th IEEE International Conference on Cognitive Informatics.

[24]  Shusaku Tsumoto,et al.  Comparing Accuracies of Rule Evaluation Models to Determine Human Criteria on Evaluated Rule Sets , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[25]  Marek Sikora,et al.  Induction and selection of the most interesting Gene Ontology based multiattribute rules for descriptions of gene groups , 2011, Pattern Recognit. Lett..

[26]  Hisao Ishibuchi,et al.  Effects of Three-Objective Genetic Rule Selection on the Generalization Ability of Fuzzy Rule-Based Systems , 2003, EMO.

[27]  Matthias Ehrgott,et al.  Multiple criteria decision analysis: state of the art surveys , 2005 .

[28]  Johannes Fürnkranz,et al.  ROC ‘n’ Rule Learning—Towards a Better Understanding of Covering Algorithms , 2005, Machine Learning.

[29]  Joaquín Dopazo,et al.  BABELOMICS: a suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments , 2005, Nucleic Acids Res..

[30]  Shusaku Tsumoto,et al.  Visualization of rule's similarity using multidimensional scaling , 2003, Third IEEE International Conference on Data Mining.

[31]  Salvatore Greco,et al.  Mining Pareto-optimal rules with respect to support and confirmation or support and anti-support , 2007, Eng. Appl. Artif. Intell..

[32]  Francisco Tirado,et al.  GeneCodis: interpreting gene lists through enrichment analysis and integration of diverse biological information , 2009, Nucleic Acids Res..

[33]  Martin Kuiper,et al.  BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks , 2005, Bioinform..

[34]  Johannes Fürnkranz,et al.  Pruning Algorithms for Rule Learning , 1997, Machine Learning.

[35]  Jan Komorowski,et al.  Learning Rule-based Models of Biological Process from Gene Expression Time Profiles Using Gene Ontology , 2003, Bioinform..

[36]  José María Carazo,et al.  Integrated analysis of gene expression by association rules discovery , 2006, BMC Bioinformatics.

[37]  Joydeep Ghosh,et al.  Distance based clustering of association rules , 1999 .

[38]  Herman Midelfart Supervised Learning in the Gene Ontology Part I: A Rough Set Framework , 2005, Trans. Rough Sets.

[39]  Marek Sikora,et al.  Data-Driven Adaptive Selection of Rules Quality Measures for Improving the Rules Induction Algorithm , 2011, RSFDGrC.

[40]  Marek Sikora,et al.  Decision Rule-Based Data Models Using TRS and NetTRS - Methods and Algorithms , 2010, Trans. Rough Sets.

[41]  Nada Lavrac,et al.  Confirmation Rule Sets , 2000, PKDD.