Explaining Ovarian Cancer Gene Expression Profiles with Fuzzy Rules and Genetic Algorithms

The analysis of gene expression data is a complex task, and many tools and pipelines are available to handle big sequencing datasets for case-control (bivariate) studies. In some cases, such as pilot or exploratory studies, the researcher needs to compare more than two groups of samples consisting of a few replicates. Both standard statistical bioinformatic pipelines and innovative deep learning models are unsuitable for extracting interpretable patterns and information from such datasets. In this work, we apply a combination of fuzzy rule systems and genetic algorithms to analyze a dataset composed of 21 samples and 6 classes, useful for approaching the study of expression profiles in ovarian cancer, compared to other ovarian diseases. The proposed method is capable of performing a feature selection among genes that is guided by the genetic algorithm, and of building a set of if-then rules that explain how classes can be distinguished by observing changes in the expression of selected genes. After testing several parameters, the final model consists of 10 genes involved in the molecular pathways of cancer and 10 rules that correctly classify all samples.

[1]  Cuntai Guan,et al.  A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  Hui-Wen Chang,et al.  GATA4 is upregulated in nasopharyngeal cancer and facilitates epithelial-mesenchymal transition and metastasis through regulation of SLUG. , 2018, Experimental and therapeutic medicine.

[4]  M. Kanehisa,et al.  Computation with the KEGG pathway database. , 1998, Bio Systems.

[5]  Giovanna Castellano,et al.  MicroRNA expression classification for pediatric multiple sclerosis identification , 2021, J. Ambient Intell. Humaniz. Comput..

[6]  O. Schilling,et al.  Proteome profiling of clear cell renal cell carcinoma in von Hippel-Lindau patients highlights upregulation of Xaa-Pro aminopeptidase-1, an anti-proliferative and anti-migratory exoprotease , 2017, Oncotarget.

[7]  Lala Septem Riza,et al.  frbs: Fuzzy Rule-Based Systems for Classification and Regression in R , 2015 .

[8]  J. Prat Ovarian carcinomas: five distinct diseases with different origins, genetic alterations, and clinicopathological features , 2012, Virchows Archiv.

[9]  Denis C. Bauer,et al.  A Comparative Study of Techniques for Differential Expression Analysis on RNA-Seq Data , 2014, bioRxiv.

[10]  Arianna Consiglio,et al.  Identification and classification of meteorites using a handheld LIBS instrument coupled with a fuzzy logic-based method , 2018 .

[11]  Giovanna Castellano,et al.  On the Role of Interpretability in Fuzzy Data Mining , 2007, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[12]  Flavia Esposito,et al.  An NMF-Based Methodology for Selecting Biomarkers in the Landscape of Genes of Heterogeneous Cancer-Associated Fibroblast Populations , 2020, Bioinformatics and biology insights.

[13]  E. E. Houby A survey on applying machine learning techniques for management of diseases , 2018 .

[14]  J. Prat,et al.  Ovarian carcinomas: at least five different diseases with distinct histological features and molecular genetics. , 2018, Human pathology.

[15]  Jiliang Tang,et al.  Gene Expression and Protein Function: A Survey of Deep Learning Methods , 2019, SKDD.

[16]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[17]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[18]  DTX3L and ARTD9 inhibit IRF1 expression and mediate in cooperation with ARTD8 survival and proliferation of metastatic prostate cancer cells , 2014, Molecular Cancer.

[19]  Corrado Mencar,et al.  A fuzzy method for RNA-Seq differential expression analysis in presence of multireads , 2016, BMC Bioinformatics.

[20]  Gennaro Vessio,et al.  Ensembling complex network 'perspectives' for mild cognitive impairment detection with artificial neural networks , 2020, Pattern Recognit. Lett..

[21]  M. Powers,et al.  Nuclear pore proteins and cancer. , 2009, Seminars in cell & developmental biology.

[22]  Arthur Liberzon,et al.  A description of the Molecular Signatures Database (MSigDB) Web site. , 2014, Methods in molecular biology.

[23]  Giovanna Castellano,et al.  A Fuzzy Rule-Based Decision Support System for Cardiovascular Risk Assessment , 2018, WILF.

[24]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[25]  Vanesa Segovia Bucheli,et al.  A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data , 2020, PeerJ Comput. Sci..

[26]  M. Coluccia,et al.  Intelligent Microarray Data Analysis through Non-negative Matrix Factorization to Study Human Multiple Myeloma Cell Lines , 2019, Applied Sciences.

[27]  Danilo Caivano,et al.  CRISPRLearner: A Deep Learning-Based System to Predict CRISPR/Cas9 sgRNA On-Target Cleavage Efficiency , 2019, Electronics.

[28]  Robert X. Gao,et al.  Deep learning and its applications to machine health monitoring , 2019, Mechanical Systems and Signal Processing.