Classification and gene selection of triple-negative breast cancer subtype embedding gene connectivity matrix in deep neural network

Triple-negative breast cancer (TNBC) has been a challenging breast cancer subtype for oncological therapy. Normally, it can be classified into different molecular subtypes. Accurate and stable classification of the six subtypes is essential for personalized treatment of TNBC. In this study, we proposed a new framework to distinguish the six subtypes of TNBC, and this is one of the handful studies that completed the classification based on mRNA and long noncoding RNA expression data. Particularly, we developed a gene selection approach named DGGA, which takes correlation information between genes into account in the process of measuring gene importance and then effectively removes redundant genes. A gene scoring approach that combined GeneRank scores with gene importance generated by deep neural network (DNN), taking inter-subtype discrimination and inner-gene correlations into account, was came up to improve gene selection performance. More importantly, we embedded a gene connectivity matrix in the DNN for sparse learning, which takes additional consideration with weight changes during training when obtaining the measurement of the relative importance of each gene. Finally, Genetic Algorithm was used to simulate the natural evolutionary process to search for the optimal subset of TNBC subtype classification. We validated the proposed method through cross-validation, and the results demonstrate that it can use fewer genes to obtain more accurate classification results. The implementation for the proposed method is available at https://github.com/RanSuLab/TNBC.

[1]  Xinyi Liu,et al.  Predicting drug-induced hepatotoxicity based on biological feature maps and diverse classification strategies , 2019, Briefings Bioinform..

[2]  Zexuan Zhu,et al.  Wrapper–Filter Feature Selection Algorithm Using a Memetic Framework , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  Gordon B Mills,et al.  Comprehensive Genomic Analysis Identifies Novel Subtypes and Targets of Triple-Negative Breast Cancer , 2014, Clinical Cancer Research.

[4]  Darrell Whitley,et al.  A genetic algorithm tutorial , 1994, Statistics and Computing.

[5]  Cláudio Rebelo de Sá,et al.  Variance-Based Feature Importance in Neural Networks , 2019, DS.

[6]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[7]  Gerhard Nahler,et al.  Pearson Correlation Coefficient , 2020, Definitions.

[8]  C. Furlanello,et al.  Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products , 2006 .

[9]  N Harbeck,et al.  Triple-negative breast cancer--current status and future directions. , 2009, Annals of oncology : official journal of the European Society for Medical Oncology.

[10]  Desmond J. Higham,et al.  GeneRank: Using search engine technology for the analysis of microarray experiments , 2005, BMC Bioinformatics.

[11]  C. Ding,et al.  Gene selection algorithm by combining reliefF and mRMR , 2008, BMC Genomics.

[12]  Tianwei Yu,et al.  A Deep Neural Network Model using Random Forest to Extract Feature Representation for Gene Expression Data Classification , 2018, Scientific Reports.

[13]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[14]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[15]  X. Chen,et al.  Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies. , 2011, The Journal of clinical investigation.

[16]  K. Tomczak,et al.  The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge , 2015, Contemporary oncology.

[17]  Oncogenic long noncoding RNA landscape in breast cancer , 2017, Molecular Cancer.

[18]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[19]  Lingyu Xu,et al.  Multi-label feature selection algorithm based on label pairwise ranking comparison transformation , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[20]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[21]  C. Sotiriou,et al.  Unravelling triple-negative breast cancer molecular heterogeneity using an integrative multiomic analysis , 2018, Annals of oncology : official journal of the European Society for Medical Oncology.

[22]  Abien Fred Agarap Deep Learning using Rectified Linear Units (ReLU) , 2018, ArXiv.

[23]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[25]  J. Pietenpol,et al.  Identification and use of biomarkers in treatment strategies for triple‐negative breast cancer subtypes , 2014, The Journal of pathology.

[26]  S. Horvath,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .

[27]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[28]  Tianwei Yu,et al.  A graph‐embedded deep feedforward network for disease outcome classification and feature selection using gene expression data , 2018, Bioinform..

[29]  Julian D. Olden,et al.  Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks , 2002 .

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Hamza Lasla,et al.  Gene-expression molecular subtyping of triple-negative breast cancer tumours: importance of immune response , 2015, Breast Cancer Research.

[32]  Holger Fröhlich,et al.  Integration of pathway knowledge into a reweighted recursive feature elimination approach for risk stratification of cancer patients , 2010, Bioinform..

[33]  B. Ring,et al.  Generation of an algorithm based on minimal gene sets to clinically subtype triple negative breast cancer patients , 2016, BMC Cancer.

[34]  Ling Xu,et al.  Optimization method for trajectory combination in surveillance video synopsis based on genetic algorithm , 2015, J. Ambient Intell. Humaniz. Comput..

[35]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[36]  Eran Segal,et al.  Regularization Learning Networks , 2018, NeurIPS.

[37]  Ran Su,et al.  Identification of expression signatures for non-small-cell lung carcinoma subtype classification , 2019, Bioinform..

[38]  Guodong Yang,et al.  LncRNA: a link between RNA and cancer. , 2014, Biochimica et biophysica acta.

[39]  Qing-Yu He,et al.  DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis , 2015, Bioinform..

[40]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[41]  K. Watabe,et al.  Roles of lncRNA in breast cancer. , 2015, Frontiers in bioscience.

[42]  G. David Garson,et al.  Interpreting neural-network connection weights , 1991 .

[43]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Ahmedin Jemal,et al.  Breast cancer statistics, 2019 , 2019, CA: a cancer journal for clinicians.

[45]  Siddhartha Bhattacharyya,et al.  A group incremental feature selection for classification using rough set theory based genetic algorithm , 2018, Appl. Soft Comput..

[46]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[47]  Erik Strumbelj,et al.  Explaining prediction models and individual predictions with feature contributions , 2014, Knowledge and Information Systems.