XGSEA: CROSS-species gene set enrichment analysis via domain adaptation

MOTIVATION Gene set enrichment analysis (GSEA) has been widely used to identify gene sets with statistically significant difference between cases and controls against a large gene set. GSEA needs both phenotype labels and expression of genes. However, gene expression are assessed more often for model organisms than minor species. Also, importantly gene expression are not measured well under specific conditions for human, due to high risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus, predicting enrichment significance (on a phenotype) of a given gene set of a species (target, say human), by using gene expression measured under the same phenotype of the other species (source, say mouse) is a vital and challenging problem, which we call CROSS-species gene set enrichment problem (XGSEP). RESULTS For XGSEP, we propose the CROSS-species gene set enrichment analysis (XGSEA), with three steps of: (1) running GSEA for a source species to obtain enrichment scores and $p$-values of source gene sets; (2) representing the relation between source and target gene sets by domain adaptation; and (3) using regression to predict $p$-values of target gene sets, based on the representation in (2). We extensively validated the XGSEA by using five regression and one classification measurements on four real data sets under various settings, proving that the XGSEA significantly outperformed three baseline methods in most cases. A case study of identifying important human pathways for T -cell dysfunction and reprogramming from mouse ATAC-Seq data further confirmed the reliability of the XGSEA. AVAILABILITY Source code of the XGSEA is available through https://github.com/LiminLi-xjtu/XGSEA.

[1]  Alistair G. Rust,et al.  Ensembl 2002: accommodating comparative genomics , 2003, Nucleic Acids Res..

[2]  W. Haining,et al.  Normalizing the environment recapitulates adult human immune traits in laboratory mice , 2016, Nature.

[3]  C. Hughes,et al.  Of Mice and Not Men: Differences between Mouse and Human Immunology , 2004, The Journal of Immunology.

[4]  N. Geifman,et al.  The Mouse Age Phenome Knowledgebase and Disease-Specific Inter-Species Age Mapping , 2013, PloS one.

[5]  Daniel R. Zerbino,et al.  Ensembl 2014 , 2013, Nucleic Acids Res..

[6]  Jinbo Bi,et al.  A cross-species bi-clustering approach to identifying conserved co-regulated genes , 2016, Bioinform..

[7]  T. Hünig The storm has cleared: lessons from the CD28 superagonist TGN1412 trial , 2012, Nature Reviews Immunology.

[8]  Jianzhi Zhang,et al.  Null mutations in human and mouse orthologs frequently result in different phenotypes , 2008, Proceedings of the National Academy of Sciences.

[9]  Djordje Djordjevic,et al.  XGSA: A statistical method for cross-species gene set analysis , 2016, Bioinform..

[10]  Freddy Radtke,et al.  Regulation of innate and adaptive immunity by Notch , 2013, Nature Reviews Immunology.

[11]  H. Parkinson,et al.  Large scale comparison of global gene expression patterns in human and mouse , 2010, Genome Biology.

[12]  Renaud Gaujoux,et al.  Found In Translation: a machine learning model for mouse-to-human inference , 2018, Nature Methods.

[13]  P. Bugelski,et al.  Concordance of preclinical and clinical pharmacology and toxicology of therapeutic monoclonal antibodies and fusion proteins: cell surface targets , 2012, British journal of pharmacology.

[14]  Christina S. Leslie,et al.  Chromatin states define tumor-specific T cell dysfunction and reprogramming , 2017, Nature.

[15]  Mehrtash Tafazzoli Harandi,et al.  Distribution-Matching Embedding for Visual Domain Adaptation , 2016, J. Mach. Learn. Res..

[16]  A. E. Sousa,et al.  Delta-like 1–Mediated Notch Signaling Enhances the In Vitro Conversion of Human Memory CD4 T Cells into FOXP3-Expressing Regulatory T Cells , 2014, The Journal of Immunology.

[17]  E. Birney,et al.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt , 2009, Nature Protocols.

[18]  C. Ceol,et al.  Ligand-activated BMP signaling inhibits cell differentiation and death to promote melanoma , 2018, The Journal of clinical investigation.

[19]  Chen Li,et al.  CD271 is a molecular switch with divergent roles in melanoma and melanocyte development , 2019, Scientific Reports.

[20]  M. Seldin,et al.  Human/mouse homology relationships. , 1996, Genomics.

[21]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[22]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[23]  K. Yasutomo,et al.  Regulation of CD8+ T Cells and Antitumor Immunity by Notch Signaling , 2018, Front. Immunol..

[24]  S. Varambally,et al.  Cancer mediates effector T cell dysfunction by targeting microRNAs and EZH2 via glycolysis restriction , 2015, Nature Immunology.

[25]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.