R.ROSETTA: a package for analysis of rule-based classification models

ROSETTA is a rough set-based classification toolkit that aims at identifying semantics from various data types. Here we present the R.ROSETTA package, which is an R wrapper of ROSETTA. The package significantly enhances the accessibility of the existing machine learning environment and the interpretability of the results. The ROSETTA functions have been enriched and improved by the incorporation of novel components targeting bioinformatics applications. Such improvements include: undersampling imbalanced datasets, estimation of the statistical significance of classification rules, retrieval of support sets, prediction of external data and integration with rule visualization frameworks. We tested the performance of R.ROSETTA on a complex dataset involving gene expression measurements for autistic and non-autistic young males. We demonstrated that R.ROSETTA facilitated the detection of novel gene-gene interactions. The results demonstrated the potential of R.ROSETTA classifiers to identify putative biomarkers and novel biological interactions.

[1]  H. Hannah Inbarani,et al.  Cardiac arrhythmia classification using multi-granulation rough set approaches , 2018, Int. J. Mach. Learn. Cybern..

[2]  Yi Pan,et al.  International Journal of Approximate Reasoning a Comparison of Parallel Large-scale Knowledge Acquisition Using Rough Set Theory on Different Mapreduce Runtime Systems , 2022 .

[3]  Laura E. Barnes,et al.  Rough Set Theory based prognostication of life expectancy for terminally ill patients , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[4]  Stephan J Sanders,et al.  Use of array CGH to detect exonic copy number variants throughout the genome in autism families detects a novel deletion in TMLHE. , 2011, Human molecular genetics.

[5]  Dietrich A. Stephan,et al.  Autism and Increased Paternal Age Related Changes in Global Levels of Gene Expression Regulation , 2011, PloS one.

[6]  Noor Akhmad Setiawan,et al.  Diagnosis of Coronary Artery Disease Using Artificial Intelligence Based Decision Support System , 2020, ArXiv.

[7]  John A. Sweeney,et al.  Genome-Wide Analyses of Exonic Copy Number Variants in a Family-Based Study Point to Novel Autism Susceptibility Genes , 2009, PLoS genetics.

[8]  Bung-Nyun Kim,et al.  Association between PTGS2 polymorphism and autism spectrum disorders in Korean trios , 2008, Neuroscience Research.

[9]  Alain Malafosse,et al.  Dodecamer repeat expansion in cystatin B gene in progressive myoclonus epilepsy , 1997, Nature.

[10]  Theresa Beaubouef,et al.  Rough Sets , 2019, Lecture Notes in Computer Science.

[11]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[12]  Jan Komorowski,et al.  Learning Rule-Based Models - The Rough Set Approach , 2014 .

[13]  S. Puglisi‐Allegra,et al.  Altered calcium homeostasis in autism-spectrum disorders: evidence from biochemical and genetic studies of the mitochondrial aspartate/glutamate carrier AGC1 , 2010, Molecular Psychiatry.

[14]  Aleksander Ohrn,et al.  ROSETTA -- A Rough Set Toolkit for Analysis of Data , 1997 .