Sampling Defective Pathways in Phenotype Prediction Problems via the Fisher's Ratio Sampler

In this paper, we introduce the Fisher’s ratio sampler that serves to unravel the defective pathways in highly underdetermined phenotype prediction problems. This sampling algorithm first selects the most discriminatory genes, that are at the same time differentially expressed, and samples the high discriminatory genetic networks with a prior probability that it is proportional to their individual Fisher’s ratio. The number of genes of the different networks is randomly established taking into account the length of the minimum-scale signature of the phenotype prediction problem which is the one that contains the most discriminatory genes with the maximum predictive power. The likelihood of the different networks is established via leave-one-out-cross-validation. Finally, the posterior analysis of the most frequently sampled genes serves to establish the defective biological pathways. This novel sampling algorithm is much faster and simpler than Bayesian Networks. We show its application to a microarray dataset concerning a type of breast cancers with very bad prognosis (TNBC). In these kind of cancers, the breast cancer cells have tested negative for hormone epidermal growth factor receptor 2 (HER-2), estrogen receptors (ER), and progesterone receptors (PR). This lack causes that common treatments like hormone therapy and drugs that target estrogen, progesterone, and HER-2 are ineffective. We believe that the genetic pathways that are identified via the Fisher’s ratio sampler, which are mainly related to signaling pathways, provide new insights about the molecular mechanisms that are involved in this complex disease. The Fisher’s ratio sampler can be also applied to the genetic analysis of other complex diseases.

[1]  Guozhang Mao,et al.  DDX23-Linc00630-HDAC1 axis activates the Notch pathway to promote metastasis , 2017, Oncotarget.

[2]  Mark E. Borsuk,et al.  Using Bayesian networks to discover relations between genes, environment, and disease , 2013, BioData Mining.

[3]  Michael J. Tompkins,et al.  On the topography of the cost functional in linear and nonlinear inverse problems , 2012 .

[4]  M. Barmada,et al.  Identifying genetic interactions in genome‐wide data using Bayesian networks , 2010, Genetic epidemiology.

[5]  Juan Luis Fernández-Martínez,et al.  From Bayes to Tarantola: New insights to understand uncertainty in inverse problems☆ , 2013 .

[6]  Ana Cernea,et al.  Genomic risk prediction of aromatase inhibitor‐related arthralgia in patients with breast cancer using a novel machine‐learning algorithm , 2017, Cancer medicine.

[7]  Enrique J. deAndrés-Galiana,et al.  Genomic data integration in chronic lymphocytic leukemia , 2017, The journal of gene medicine.

[8]  Chengbo Yu,et al.  Comprehensive analysis of long non-coding RNA expression profiles in hepatitis B virus-related hepatocellular carcinoma , 2016, Oncotarget.

[9]  Enrique J. deAndrés-Galiana,et al.  Design of Biomedical Robots for Phenotype Prediction Problems , 2016, J. Comput. Biol..

[10]  K. Hess,et al.  Response to neoadjuvant therapy and long-term survival in patients with triple-negative breast cancer. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[11]  Enrique J. deAndrés-Galiana,et al.  Sensitivity analysis of gene ranking methods in phenotype prediction , 2016, J. Biomed. Informatics.

[12]  J. Lee,et al.  STC-1 expression is upregulated through an Akt/NF-κB-dependent pathway in triple-negative breast cancer cells. , 2016, Oncology reports.

[13]  Hamza Lasla,et al.  Gene-expression molecular subtyping of triple-negative breast cancer tumours: importance of immune response , 2015, Breast Cancer Research.

[14]  Tsviya Olender,et al.  GeneDecks: paralog hunting and gene-set distillation with GeneCards annotation. , 2009, Omics : a journal of integrative biology.