Inferring subgroup-specific driver genes from heterogeneous cancer samples via subspace learning with subgroup indication

MOTIVATION Detecting driver genes from gene mutation data is a fundamental task for tumorigenesis research. Due to the fact that cancer is a heterogeneous disease with various subgroups, subgroup specific driver genes are the key factors in the development of precision medicine for heterogeneous cancer. However, the existing driver gene detection methods are not designed to identify subgroup specificities of their detected driver genes, and therefore cannot indicate which group of patients are associated with the detected driver genes, which is difficult to provide specifically clinical guidance for individual patients. RESULTS By incorporating the subspace learning framework, we propose a novel bioinformatics method called DriverSub, which can efficiently predict subgroup specific driver genes in the situation where the subgroup annotations are not available. When evaluated by simulation datasets with known ground truth and compared with existing methods, DriverSub yields the best prediction of driver genes and the inference of their related subgroups. When we apply DriverSub on the mutation data of real heterogeneous cancers, we can observe that the predicted results of DriverSub are highly enriched for experimentally validated known driver genes. Moreover, the subgroups inferred by DriverSub are significantly associated with the annotated molecular subgroups, indicating its capability of predicting subgroup specific driver genes. AVAILABILITY The source code are publicly available at https://github.com/JianingXi/DriverSub. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  K. Tomczak,et al.  The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge , 2015, Contemporary oncology.

[2]  N. Rosenfeld,et al.  The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes , 2016, Nature Communications.

[3]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[4]  David Tamborero,et al.  OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes , 2013, Bioinform..

[5]  Thomas S. Huang,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation. , 2011, IEEE transactions on pattern analysis and machine intelligence.

[6]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer genes , 2014 .

[7]  Tieniu Tan,et al.  Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jin Gu,et al.  Evaluating the molecule-based prediction of clinical drug responses in cancer , 2016, Bioinform..

[9]  P. Sooriakumaran,et al.  Tumour heterogeneity poses a significant challenge to cancer biomarker research , 2017, British Journal of Cancer.

[10]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of urothelial bladder carcinoma , 2014, Nature.

[11]  Aapo Hyvärinen,et al.  Independent component analysis: recent advances , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[12]  T. Mukohara,et al.  PI3K mutations in breast cancer: prognostic and therapeutic implications , 2015, Breast cancer.

[13]  Shihua Zhang,et al.  Tumor characterization and stratification by integrated molecular profiles reveals essential pan-cancer features , 2015, BMC Genomics.

[14]  S. Ganesan,et al.  Looking beyond drivers and passengers in cancer genome sequencing data. , 2016, Annals of oncology : official journal of the European Society for Medical Oncology.

[15]  K. Kinzler,et al.  Evaluating the evaluation of cancer driver genes , 2016, Proceedings of the National Academy of Sciences.

[16]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[18]  C. Cole,et al.  The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers , 2018, Nature Reviews Cancer.

[19]  Ao Li,et al.  CLImAT: accurate detection of copy number alteration and loss of heterozygosity in impure and aneuploid tumor samples using whole-genome sequencing data , 2014, Bioinform..

[20]  Juan Liu,et al.  Pattern fusion analysis by adaptive alignment of multiple heterogeneous omics data , 2017, Bioinform..

[21]  S. Gabriel,et al.  Advances in understanding cancer genomes through second-generation sequencing , 2010, Nature Reviews Genetics.

[22]  Yili Yin,et al.  p53 stability and activity is regulated by Mdm2-mediated induction of alternative p53 translation products , 2002, Nature Cell Biology.

[23]  A. Shaw,et al.  Tumour heterogeneity and resistance to cancer therapies , 2018, Nature Reviews Clinical Oncology.

[24]  Benjamin E. Gross,et al.  Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal , 2013, Science Signaling.

[25]  Shihua Zhang,et al.  Discovery of cancer common and specific driver gene sets , 2016, Nucleic acids research.

[26]  Yi Pan,et al.  SinNLRR: a robust subspace clustering method for cell type detection by non-negative and low-rank representation , 2019, Bioinform..

[27]  Leyla Isik,et al.  Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations. , 2009, Cancer research.

[28]  Vladik Kreinovich,et al.  Why l1 Is a Good Approximation to l0: A Geometric Explanation , 2013 .

[29]  K. Kinzler,et al.  Cancer Genome Landscapes , 2013, Science.

[30]  Yang Zheng,et al.  Capsule Network Based Modeling of Multi-omics Data for Discovery of Breast Cancer-Related Genes , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[31]  Steven J. M. Jones,et al.  Comprehensive Characterization of Cancer Driver Genes and Mutations , 2018, Cell.

[32]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[33]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[34]  Jing Liu,et al.  Robust Structured Subspace Learning for Data Representation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Na-Na Guan,et al.  Predicting miRNA‐disease association based on inductive matrix completion , 2018, Bioinform..

[36]  Guojun Li,et al.  MaxMIF: A New Method for Identifying Cancer Driver Genes through Effective Data Integration , 2018, Advanced science.

[37]  Ash A. Alizadeh,et al.  Toward understanding and exploiting tumor heterogeneity , 2015, Nature Medicine.

[38]  A. Jemal,et al.  Cancer statistics, 2019 , 2019, CA: a cancer journal for clinicians.

[39]  Dacheng Tao,et al.  Double Shrinking Sparse Dimension Reduction , 2013, IEEE Transactions on Image Processing.

[40]  Xiaobo Zhou,et al.  A novel missense-mutation-related feature extraction scheme for 'driver' mutation identification , 2012, Bioinform..

[41]  Ao Li,et al.  A novel approach for drug response prediction in cancer cell lines via network representation learning , 2018, Bioinform..

[42]  Jorge Cadima,et al.  Principal component analysis: a review and recent developments , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[43]  Zhen Cao,et al.  An Integrative Framework for Combining Sequence and Epigenomic Data to Predict Transcription Factor Binding Sites Using Deep Learning , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.