R.ROSETTA: an interpretable machine learning framework

Motivation For machine learning to matter beyond intellectual curiosity, the models developed therefrom must be adopted within the greater scientific community. In this study, we developed an interpretable machine learning framework that allows identification of semantics from various datatypes. Our package can analyze and illuminate co-predictive mechanisms reflecting biological processes. Results We present R.ROSETTA, an R package for building and analyzing interpretable machine learning models. R.ROSETTA gathers combinatorial statistics via rule-based modelling for accessible and transparent results, well-suited for adoption within the greater scientific community. The package also provides statistics and visualization tools that facilitate minimization of analysis bias and noise. Investigating case-control studies of autism, we showed that our tool provided hypotheses for potential interdependencies among features that discerned phenotype classes. These interdependencies regarded neurodevelopmental and autism-related genes. Although our sample application of R.ROSETTA was used for transcriptomic data analysis, R.ROSETTA works perfectly with any decision-related omics data. Availability The R.ROSETTA package is freely available at https://github.com/komorowskilab/R.ROSETTA. Contact mateusz.garbulowski@icm.uu.se (Mateusz Garbulowski), jan.komorowski@icm.uu.se (Jan Komorowski)

[1]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Bart Baesens,et al.  An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models , 2011, Decis. Support Syst..

[3]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[4]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..

[5]  Z. Bacova,et al.  Abnormalities in interactions of Rho GTPases with scaffolding proteins contribute to neurodevelopmental disorders , 2018, Journal of neuroscience research.

[6]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[7]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[8]  Aleksander Ohrn,et al.  ROSETTA -- A Rough Set Toolkit for Analysis of Data , 1997 .

[9]  Ahmad Taher Azar,et al.  Rough set theory with Jaya optimization for acute lymphoblastic leukemia classification , 2019, Neural Computing and Applications.

[10]  Bung-Nyun Kim,et al.  Association between PTGS2 polymorphism and autism spectrum disorders in Korean trios , 2008, Neuroscience Research.

[11]  Mert Bal,et al.  Rough Sets Theory as Symbolic Data Mining Method: An Application on Complete Decision Table , 2013 .

[12]  Francisco Herrera,et al.  Implementing algorithms of rough set theory and fuzzy rough set theory in the R package "RoughSets" , 2014, Inf. Sci..

[13]  Noor Akhmad Setiawan,et al.  Diagnosis of Coronary Artery Disease Using Artificial Intelligence Based Decision Support System , 2020, ArXiv.

[14]  Jan Komorowski,et al.  Learning Rule-Based Models - The Rough Set Approach , 2014 .

[15]  Yoshifumi Tanaka,et al.  Cyclooxygenase‐2 is induced in the endothelial cells throughout the central nervous system during carrageenan‐induced hind paw inflammation; its possible role in hyperalgesia , 2003, Journal of neurochemistry.

[16]  S. Puglisi‐Allegra,et al.  Altered calcium homeostasis in autism-spectrum disorders: evidence from biochemical and genetic studies of the mitochondrial aspartate/glutamate carrier AGC1 , 2010, Molecular Psychiatry.

[17]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[18]  K. Fidelis,et al.  Discovering regulatory binding-site modules using rule-based learning. , 2005, Genome research.

[19]  Stephen Omondi Otieno Anyango,et al.  VisuNet: Visualizing Networks of feature interactions in rule-based classifiers , 2016 .

[20]  Kurt Hornik,et al.  Open-source machine learning: R meets Weka , 2009, Comput. Stat..

[21]  Jan Komorowski,et al.  Ciruvis: a web-based tool for rule networks and interaction detection using rule-based classifiers , 2014, BMC Bioinformatics.

[22]  Ron Kohavi,et al.  The Power of Decision Tables , 1995, ECML.

[23]  Christina B. Azodi,et al.  Opening the Black Box: Interpretable Machine Learning for Geneticists. , 2020, Trends in genetics : TIG.

[24]  Michał Dramiński,et al.  Discovering Networks of Interdependent Features in High-Dimensional Problems , 2016 .

[25]  H. Hannah Inbarani,et al.  Cardiac arrhythmia classification using multi-granulation rough set approaches , 2018, Int. J. Mach. Learn. Cybern..

[26]  S. Senthil Kumar,et al.  Cardiac arrhythmia classification using multi-granulation rough set approaches , 2018 .

[27]  J. Komorowski,et al.  Combinations of Histone Modifications Mark Exon Inclusion Levels , 2012, PloS one.

[28]  Laura E. Barnes,et al.  Rough Set Theory based prognostication of life expectancy for terminally ill patients , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[29]  Yi Pan,et al.  International Journal of Approximate Reasoning a Comparison of Parallel Large-scale Knowledge Acquisition Using Rough Set Theory on Different Mapreduce Runtime Systems , 2022 .

[30]  Dietrich A. Stephan,et al.  Autism and Increased Paternal Age Related Changes in Global Levels of Gene Expression Regulation , 2011, PloS one.

[31]  Theresa Beaubouef,et al.  Rough Sets , 2019, Lecture Notes in Computer Science.

[32]  Rafael Bello,et al.  Rough Sets in Machine Learning: A Review , 2017 .

[33]  Christoph Molnar,et al.  Interpretable Machine Learning , 2020 .

[34]  Alain Malafosse,et al.  Dodecamer repeat expansion in cystatin B gene in progressive myoclonus epilepsy , 1997, Nature.

[35]  M. Pourahmadi Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation , 1999 .

[36]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[37]  B. Ehrlich,et al.  NCS-1 is a regulator of calcium signaling in health and disease. , 2018, Biochimica et biophysica acta. Molecular cell research.

[38]  Andrzej Skowron,et al.  Chapter 19 the Design and Implementation of a Knowledge Discovery Toolkit Based on Rough Sets { the Rosetta System , 1998 .

[39]  Bozena Kaminska,et al.  Combinatorial identification of DNA methylation patterns over age in the human brain , 2016, BMC Bioinformatics.