Adaptive Unsupervised Feature Learning for Gene Signature Identification in Non-Small-Cell Lung Cancer

Non-small-cell lung cancer (NSCLC) is the most common type of lung cancer, which accounts for a proportion of nearly 85%. The increasing availability of genome-wide gene expression data has facilitated the identification of gene signatures that are significant to the precise classification of NSCLC subtypes and personalized treatment decisions. Unsupervised feature selection is an effective computational technique for searching the most discriminative feature subset to distinguish different classes and find the potential information embedded in biological data. In this study, we proposed a novel unsupervised feature selection method to identify the gene signatures for NSCLC subtype classification based on gene expression data. The proposed method incorporated linear discriminant analysis, adaptive structure preservation, and $l_{2,1}$ -norm sparse regression into a joint learning framework for unsupervised feature selection to select the informative genes. An effective algorithm was developed to solve the optimization problem in the proposed method. Furthermore, we performed module-based gene filtering before feature selection to reduce the computational cost. We evaluated the proposed method on a gene expression dataset of NSCLC from The Cancer Genome Atlas (TCGA). The experimental results show that the proposed method identified a small number of gene signatures for accurate NSCLC subtype classification. Enrichment analysis of the identified gene signatures was also performed by summarizing the key biological processes.

[1]  Ran Su,et al.  CPPred-FL: a sequence-based predictor for large-scale identification of cell-penetrating peptides by feature representation learning , 2018, Briefings Bioinform..

[2]  Holger Fröhlich,et al.  Integration of pathway knowledge into a reweighted recursive feature elimination approach for risk stratification of cancer patients , 2010, Bioinform..

[3]  A. Jemal,et al.  Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries , 2018, CA: a cancer journal for clinicians.

[4]  Jaime Rodriguez-Canales,et al.  An Expression Signature as an Aid to the Histologic Classification of Non–Small Cell Lung Cancer , 2016, Clinical Cancer Research.

[5]  Sebastián Ventura,et al.  Scalable extensions of the ReliefF algorithm for weighting and selecting features on the multi-label learning context , 2015, Neurocomputing.

[6]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[7]  Xiucai Ye,et al.  Ensemble Feature Learning to Identify Risk Factors for Predicting Secondary Cancer , 2019, International journal of medical sciences.

[8]  Xiao-guang Liu,et al.  miRNAs expression profiling to distinguish lung squamous-cell carcinoma from adenocarcinoma subtypes , 2012, Journal of Cancer Research and Clinical Oncology.

[9]  S. Horvath,et al.  A General Framework for Weighted Gene Co-Expression Network Analysis , 2005, Statistical applications in genetics and molecular biology.

[10]  Ming Sun,et al.  Long non-coding RNA MVIH indicates a poor prognosis for non-small cell lung cancer and promotes cell proliferation and invasion , 2014, Tumor Biology.

[11]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[12]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[13]  Tetsuya Sakurai,et al.  Unsupervised Feature Selection with Correlation and Individuality Analysis , 2022 .

[14]  Tetsuya Sakurai,et al.  Robust Similarity Measure for Spectral Clustering Based on Shared Neighbors , 2016 .

[15]  Xiaosheng Wang,et al.  A Robust Gene Selection Method for Microarray-based Cancer Classification , 2010, Cancer informatics.

[16]  Ru-kun He,et al.  A Robust 8-Gene Prognostic Signature for Early-Stage Non-small Cell Lung Cancer , 2019, Front. Oncol..

[17]  Feiping Nie,et al.  Effective Discriminative Feature Selection With Nontrivial Solution , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Tetsuya Sakurai,et al.  Distributed Collaborative Feature Selection Based on Intermediate Representation , 2019, IJCAI.

[19]  Alex M. Andrew,et al.  UNDERSTANDING INTELLIGENCE, by Rolf Pfeifer and Christian Scheier, MIT Press, Cambridge, Mass., 1999, xx+697 pp. ISBN 0-262-16181-8 (hardback, £37.50). , 2000, Robotica.

[20]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[21]  Xinyi Liu,et al.  MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy-defined energy , 2020, Briefings Bioinform..

[22]  Tetsuya Sakurai,et al.  Feature Selection via Embedded Learning Based on Tangent Space Alignment for Microarray Data , 2017 .

[23]  Y. Chang,et al.  Keratinization of Lung Squamous Cell Carcinoma Is Associated with Poor Clinical Outcome , 2017, Tuberculosis and respiratory diseases.

[24]  Michael Schroeder,et al.  Google Goes Cancer: Improving Outcome Prediction for Cancer Patients by Network-Based Ranking of Marker Genes , 2012, PLoS Comput. Biol..

[25]  Tetsuya Sakurai,et al.  Global Discriminant Analysis for Unsupervised Feature Selection with Local Structure Preservation , 2016, FLAIRS Conference.

[26]  Tetsuya Sakurai,et al.  An oversampling framework for imbalanced classification based on Laplacian eigenmaps , 2020, Neurocomputing.

[27]  Jagath C. Rajapakse,et al.  Support Vector Based T-Score for Gene Ranking , 2008, PRIB.

[28]  Jing Liu,et al.  Unsupervised Feature Selection Using Nonnegative Spectral Analysis , 2012, AAAI.

[29]  Lei Wang,et al.  Global and Local Structure Preservation for Feature Selection , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[30]  Tetsuya Sakurai,et al.  Spectral clustering with adaptive similarity measure in Kernel space , 2018, Intell. Data Anal..

[31]  R. Kurzrock,et al.  Targeted therapy in non-small-cell lung cancer—is it becoming a reality? , 2010, Nature Reviews Clinical Oncology.

[32]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[33]  Majid Ahmadi,et al.  Investigating the Performance of Naive- Bayes Classifiers and K- Nearest Neighbor Classifiers , 2007, 2007 International Conference on Convergence Information Technology (ICCIT 2007).

[34]  Qing-Yu He,et al.  DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis , 2015, Bioinform..

[35]  Feiping Nie,et al.  Clustering and projected clustering with adaptive neighbors , 2014, KDD.

[36]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[37]  Ran Su,et al.  Identification of expression signatures for non-small-cell lung carcinoma subtype classification , 2019, Bioinform..

[38]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[39]  Xinyi Liu,et al.  Predicting drug-induced hepatotoxicity based on biological feature maps and diverse classification strategies , 2019, Briefings Bioinform..

[40]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[41]  Tetsuya Sakurai,et al.  Unsupervised Feature Selection for Microarray Gene Expression Data Based on Discriminative Structure Learning , 2018, J. Univers. Comput. Sci..

[42]  Huan Liu,et al.  Discriminant Analysis for Unsupervised Feature Selection , 2014, SDM.

[43]  Y. Wang,et al.  Identification of prognostic signature of non–small cell lung cancer based on TCGA methylation data , 2020, Scientific Reports.

[44]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..