R.ROSETTA: an interpretable machine learning framework

Abstract ROSETTA is a rough set-based classification toolkit that aims at identifying semantics from various data types. Here we present the R.ROSETTA package, which is an R wrapper of ROSETTA. The package significantly enhances the accessibility of the existing machine learning environment and the interpretability of the results. The ROSETTA functions have been enriched and improved by the incorporation of novel components targeting bioinformatics applications. Such improvements include: undersampling imbalanced datasets, estimation of the statistical significance of classification rules, retrieval of support sets, prediction of external data and integration with rule visualization frameworks. We tested the performance of R.ROSETTA on a complex dataset involving gene expression measurements for autistic and non-autistic young males. We demonstrated that R.ROSETTA facilitated the detection of novel gene-gene interactions. The results demonstrated the potential of R.ROSETTA classifiers to identify putative biomarkers and novel biological interactions.

[1]  Laura E. Barnes,et al.  Rough Set Theory based prognostication of life expectancy for terminally ill patients , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[2]  Stephan J Sanders,et al.  Use of array CGH to detect exonic copy number variants throughout the genome in autism families detects a novel deletion in TMLHE. , 2011, Human molecular genetics.

[3]  Yi Pan,et al.  International Journal of Approximate Reasoning a Comparison of Parallel Large-scale Knowledge Acquisition Using Rough Set Theory on Different Mapreduce Runtime Systems , 2022 .

[4]  Theresa Beaubouef,et al.  Rough Sets , 2019, Lecture Notes in Computer Science.

[5]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[6]  Ron Kohavi,et al.  The Power of Decision Tables , 1995, ECML.

[7]  Bart Baesens,et al.  An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models , 2011, Decis. Support Syst..

[8]  Bung-Nyun Kim,et al.  Association between PTGS2 polymorphism and autism spectrum disorders in Korean trios , 2008, Neuroscience Research.

[9]  Michał Dramiński,et al.  Discovering Networks of Interdependent Features in High-Dimensional Problems , 2016 .

[10]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..

[11]  H. Hannah Inbarani,et al.  Cardiac arrhythmia classification using multi-granulation rough set approaches , 2018, Int. J. Mach. Learn. Cybern..

[12]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[13]  Noor Akhmad Setiawan,et al.  Diagnosis of Coronary Artery Disease Using Artificial Intelligence Based Decision Support System , 2020, ArXiv.

[14]  K. Fidelis,et al.  Discovering regulatory binding-site modules using rule-based learning. , 2005, Genome research.

[15]  Mert Bal,et al.  Rough Sets Theory as Symbolic Data Mining Method: An Application on Complete Decision Table , 2013 .

[16]  J. Komorowski,et al.  Combinations of Histone Modifications Mark Exon Inclusion Levels , 2012, PloS one.

[17]  B. Ehrlich,et al.  NCS-1 is a regulator of calcium signaling in health and disease. , 2018, Biochimica et biophysica acta. Molecular cell research.

[18]  Rafael Bello,et al.  Rough Sets in Machine Learning: A Review , 2017 .

[19]  M. Pourahmadi Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation , 1999 .

[20]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[21]  Kurt Hornik,et al.  Open-source machine learning: R meets Weka , 2009, Comput. Stat..

[22]  Aleksander Ohrn,et al.  ROSETTA -- A Rough Set Toolkit for Analysis of Data , 1997 .

[23]  S. Puglisi‐Allegra,et al.  Altered calcium homeostasis in autism-spectrum disorders: evidence from biochemical and genetic studies of the mitochondrial aspartate/glutamate carrier AGC1 , 2010, Molecular Psychiatry.

[24]  Francisco Herrera,et al.  Implementing algorithms of rough set theory and fuzzy rough set theory in the R package "RoughSets" , 2014, Inf. Sci..

[25]  Z. Bacova,et al.  Abnormalities in interactions of Rho GTPases with scaffolding proteins contribute to neurodevelopmental disorders , 2018, Journal of neuroscience research.

[26]  Alain Malafosse,et al.  Dodecamer repeat expansion in cystatin B gene in progressive myoclonus epilepsy , 1997, Nature.

[27]  Yoshifumi Tanaka,et al.  Cyclooxygenase‐2 is induced in the endothelial cells throughout the central nervous system during carrageenan‐induced hind paw inflammation; its possible role in hyperalgesia , 2003, Journal of neurochemistry.

[28]  Ahmad Taher Azar,et al.  Rough set theory with Jaya optimization for acute lymphoblastic leukemia classification , 2019, Neural Computing and Applications.

[29]  Jan Komorowski,et al.  Learning Rule-Based Models - The Rough Set Approach , 2014 .

[30]  John A. Sweeney,et al.  Genome-Wide Analyses of Exonic Copy Number Variants in a Family-Based Study Point to Novel Autism Susceptibility Genes , 2009, PLoS genetics.

[31]  Dietrich A. Stephan,et al.  Autism and Increased Paternal Age Related Changes in Global Levels of Gene Expression Regulation , 2011, PloS one.

[32]  Bozena Kaminska,et al.  Combinatorial identification of DNA methylation patterns over age in the human brain , 2016, BMC Bioinformatics.

[33]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.