Elastic Net based Feature Ranking and Selection

Feature selection is important in data representation and intelligent diagnosis. Elastic net is one of the most widely used feature selectors. However, the features selected are dependant on the training data, and their weights dedicated for regularized regression are irrelevant to their importance if used for feature ranking, that degrades the model interpretability and extension. In this study, an intuitive idea is put at the end of multiple times of data splitting and elastic net based feature selection. It concerns the frequency of selected features and uses the frequency as an indicator of feature importance. After features are sorted according to their frequency, linear support vector machine performs the classification in an incremental manner. At last, a compact subset of discriminative features is selected by comparing the prediction performance. Experimental results on breast cancer data sets (BCDR-F03, WDBC, GSE 10810, and GSE 15852) suggest that the proposed framework achieves competitive or superior performance to elastic net and with consistent selection of fewer features. How to further enhance its consistency on high-dimension small-sample-size data sets should be paid more attention in our future work. The proposed framework is ∗Corresponding author Email address: yushaodemia@163.com (Shaode Yu) Preprint submitted to ABC January 1, 2021 ar X iv :2 01 2. 14 98 2v 1 [ cs .L G ] 3 0 D ec 2 02 0 accessible online (https://github.com/NicoYuCN/elasticnetFR).

[1]  Ying Li,et al.  Multiparametric MRI-Based Radiomics Nomogram for Predicting Lymph Node Metastasis in Early-Stage Cervical Cancer. , 2020, Journal of magnetic resonance imaging : JMRI.

[2]  Miguel Ángel Guevara-López,et al.  Representation learning for mammography mass lesion classification with convolutional neural networks , 2016, Comput. Methods Programs Biomed..

[3]  Don Hong,et al.  Elastic net‐based framework for imaging mass spectrometry data biomarker selection and classification , 2011, Statistics in medicine.

[4]  Chandan Singh,et al.  Definitions, methods, and applications in interpretable machine learning , 2019, Proceedings of the National Academy of Sciences.

[5]  Rohaizak Muhammad,et al.  Gene expression patterns distinguish breast carcinomas from normal breast tissues: the Malaysian context. , 2010, Pathology, research and practice.

[6]  Zaiyi Liu,et al.  Pretreatment MR imaging radiomics signatures for response prediction to induction chemotherapy in patients with nasopharyngeal carcinoma. , 2018, European journal of radiology.

[7]  Verónica Bolón-Canedo,et al.  A review of feature selection methods in medical applications , 2019, Comput. Biol. Medicine.

[8]  Atul J Butte,et al.  Robust meta-analysis of gene expression using the elastic net , 2015, Nucleic acids research.

[9]  Claudia Eckert,et al.  Is Feature Selection Secure against Training Data Poisoning? , 2015, ICML.

[10]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[11]  P. Lambin,et al.  Radiomics: the bridge between medical imaging and personalized medicine , 2017, Nature Reviews Clinical Oncology.

[12]  nominatif de l’habitat,et al.  Definitions , 1964, Innovation Dynamics and Policy in the Energy Sector.

[13]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[14]  Wenjian Qin,et al.  matFR: a MATLAB toolbox for feature ranking , 2020, Bioinform..

[15]  William Nick Street,et al.  Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[16]  Yaoqin Xie,et al.  A Technical Review of Convolutional Neural Network-Based Mammographic Breast Cancer Diagnosis , 2019, Comput. Math. Methods Medicine.

[17]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[18]  Xia Hu,et al.  Techniques for interpretable machine learning , 2018, Commun. ACM.

[19]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[20]  Guangtao Zhai,et al.  Radiomics nomogram for preoperative prediction of progression-free survival using diffusion-weighted imaging in patients with muscle-invasive bladder cancer. , 2020, European journal of radiology.

[21]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[22]  Xindong Wu,et al.  Manifold elastic net: a unified framework for sparse dimension reduction , 2010, Data Mining and Knowledge Discovery.

[23]  Andrew E. Teschendorff,et al.  A comparison of feature selection and classification methods in DNA methylation studies using the Illumina Infinium platform , 2012, BMC Bioinformatics.

[24]  A. Lusis,et al.  Considerations for the design of omics studies , 2017 .

[25]  Guangtao Zhai,et al.  A Deep Learning-Based Radiomics Model for Prediction of Survival in Glioblastoma Multiforme , 2017, Scientific Reports.

[26]  Ben J. Marafino,et al.  Efficient and sparse feature selection for biomedical text classification via the elastic net: Application to ICU risk stratification from nursing notes , 2015, J. Biomed. Informatics.

[27]  Hui Li,et al.  Transfer Learning From Convolutional Neural Networks for Computer-Aided Diagnosis: A Comparison of Digital Breast Tomosynthesis and Full-Field Digital Mammography. , 2019, Academic radiology.

[28]  Shulin Wang,et al.  Feature selection in machine learning: A new perspective , 2018, Neurocomputing.

[29]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[30]  Artem Sokolov,et al.  Pathway-Based Genomics Prediction using Generalized Elastic Net , 2016, PLoS Comput. Biol..

[31]  Konrad J. Karczewski,et al.  Integrative omics for health and disease , 2018, Nature Reviews Genetics.

[32]  Xavier Estivill,et al.  Gene expression signatures in breast cancer distinguish phenotype characteristics, histologic subtypes, and tumor invasiveness , 2010, Cancer.