Fuzzy measure with regularization for gene selection and cancer prediction

Dealing with high-dimensional gene expression data is a challenging issue, and it is crucial to select multiple informative subsets of genes for cancer classification. In this regard, many statistical and machine learning methods with regulations have been developed. However, these methods neglected the epistasis, i.e., some genes may cover or affect other genes. In this article, we propose a fuzzy measure with regularization, which adopts L1 and L1/2 norms for sparse solutions, known as FMR, to describe the interaction between genes. Regularization with L1 and L1/2 can obtain a series of sparse solutions which help solving fuzzy measure quicker than traditional methods, such as Genetic Algorithm. FMR obtains a subset of genes corresponding to the fewest nonzero fuzzy measure values, and consequently, selects the important gene(s) according to the frequency of appearance in the selected gene subsets. Besides, three base classifiers, including SVM, KNN and DBN, are employed as underlying models to verify the effectiveness of the selected subset(s) of genes. Experimental results indicate that the selected genes by FMR are consistent with several clinical studies. In addition, it can produce comparable results in terms of accuracy as compared with other methods reported in the literature. The codes used in this article are freely available at: https://github.com/wangphoenix/ICMLC .

[1]  Gavin C. Cawley,et al.  Gene Selection in Cancer Classification using Sparse Logistic Regression with Bayesian Regularisation , 2006 .

[2]  Jianzhong Li,et al.  A stable gene selection in microarray data analysis , 2006, BMC Bioinformatics.

[3]  Xiaoping Chen,et al.  Gene expression and methylation profiles identified CXCL3 and CXCL8 as key genes for diagnosis and prognosis of colon adenocarcinoma , 2019, Journal of cellular physiology.

[4]  Kwong-Sak Leung,et al.  Learning nonlinear multiregression networks based on evolutionary computation , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[5]  Wanlong Ma,et al.  Predicting Prostate Biopsy Results Using a Panel of Plasma and Urine Biomarkers Combined in a Scoring System , 2016, Journal of Cancer.

[6]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[7]  M. Glass,et al.  GPR18 undergoes a high degree of constitutive trafficking but is unresponsive to N-Arachidonoyl Glycine , 2016, PeerJ.

[8]  E. Solary,et al.  Heat shock protein 27 enhances the tumorigenicity of immunogenic rat colon carcinoma cell clones. , 1998, Cancer research.

[9]  Arlan F. Fuller,et al.  Mullerian inhibiting substance inhibits growth of a human ovarian cancer in nude mice. , 1981 .

[10]  K. Kangawa,et al.  Cloning and characterization of cDNA encoding a precursor for human adrenomedullin. , 1993, Biochemical and biophysical research communications.

[11]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[12]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[13]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[14]  Chengbo Lu,et al.  The Cross-Entropy Based Multi-Filter Ensemble Method for Gene Selection , 2018, Genes.

[15]  Hai-Jiang Miao,et al.  Role of CXCL8/CXCR1 in the metastasis of human colon cancer , 2010 .

[16]  Kwong-Sak Leung,et al.  Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification , 2013, BMC Bioinformatics.

[17]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[18]  Masaki Kitajima,et al.  Heat shock protein 27, a novel regulator of 5-fluorouracil resistance in colon cancer. , 2008, Oncology reports.

[19]  Jian Yang,et al.  Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data , 2013, Comput. Biol. Medicine.

[20]  Matt Trau,et al.  "Mix-to-Go" Silver Colloidal Strategy for Prostate Cancer Molecular Profiling and Risk Prediction. , 2018, Analytical chemistry.

[21]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Zongben Xu,et al.  $L_{1/2}$ Regularization: A Thresholding Representation Theory and a Fast Solver , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Yuan Lin,et al.  CCL21 Cancer Immunotherapy , 2014, Cancers.

[24]  D. Donoho,et al.  Atomic Decomposition by Basis Pursuit , 2001 .

[25]  Guodong Zhao,et al.  Feature Subset Selection for Cancer Classification Using Weight Local Modularity , 2016, Scientific Reports.

[26]  S. Krishna Anand,et al.  Feature Selection for Microarray Data using WGCNA Based Fuzzy Forest in Map Reduce Paradigm , 2016 .

[27]  Stephen M Richardson,et al.  Human notochordal cell transcriptome unveils potential regulators of cell function in the developing intervertebral disc , 2018, Scientific Reports.

[28]  Wang Yao,et al.  L 1/2 regularization , 2010 .

[29]  Atsushi Inoue,et al.  Identification of N-arachidonylglycine as the endogenous ligand for orphan G-protein-coupled receptor GPR18. , 2006, Biochemical and biophysical research communications.

[30]  Michael I. Jordan,et al.  L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework , 2015, ArXiv.

[31]  Zijiang Yang,et al.  Partial maximum correlation information: A new feature selection method for microarray data classification , 2019, Neurocomputing.

[32]  Hongzhe Li,et al.  In Response to Comment on "Network-constrained regularization and variable selection for analysis of genomic data" , 2008, Bioinform..

[33]  Wolfgang Link,et al.  TRIB2 confers resistance to anti-cancer therapy by activating the serine/threonine protein kinase AKT , 2017, Nature Communications.

[34]  Jan Kalina,et al.  Classification methods for high-dimensional genetic data , 2014 .

[35]  Carlos J. Alonso,et al.  Microarray gene expression classification with few genes: Criteria to combine attribute selection and classification methods , 2012, Expert Syst. Appl..

[36]  Keun Ho Ryu,et al.  An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data , 2012, Bioinform..

[37]  Zhenyuan Wang,et al.  A new genetic algorithm for nonlinear multiregressions based on generalized Choquet integrals , 2003, The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03..

[38]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[39]  Holger Sültmann,et al.  Excess hepsin proteolytic activity limits oncogenic signaling and induces ER stress and autophagy in prostate cancer cells , 2019, Cell Death & Disease.

[40]  Shiv Shakti Shrivastava,et al.  An Overview on Data Mining Approach on Breast Cancer data , 2013 .

[41]  Michel Grabisch,et al.  The symmetric Sugeno integral , 2003, Fuzzy Sets Syst..

[42]  Yu Bai,et al.  Association between three genetic variants in kallikrein 3 and prostate cancer risk , 2018, Bioscience reports.

[43]  Wei Zhang,et al.  LncRNA FOXD1‐AS1 acts as a potential oncogenic biomarker in glioma , 2019, CNS neuroscience & therapeutics.

[44]  Rong Chen,et al.  Fusion of Multi-RSMOTE With Fuzzy Integral to Classify Bug Reports With an Imbalanced Distribution , 2019, IEEE Transactions on Fuzzy Systems.

[45]  Mingquan Ye,et al.  Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification , 2017, Genom. Proteom. Bioinform..

[46]  George J. Klir,et al.  Genetic algorithms for determining fuzzy measures from data , 1998, J. Intell. Fuzzy Syst..

[47]  Xiao-Ying Liu,et al.  Feature Selection and Cancer Classification via Sparse Logistic Regression with the Hybrid L1/2 +2 Regularization , 2016, PloS one.

[48]  Ian G. Mills,et al.  The cancer-associated cell migration protein TSPAN1 is under control of androgens and its upregulation increases prostate cancer cell migration , 2017, Scientific Reports.

[49]  Xiaoming Huo,et al.  Uncertainty principles and ideal atomic decomposition , 2001, IEEE Trans. Inf. Theory.

[50]  Lei Yang,et al.  GPR18 expression on PMNs as biomarker for outcome in patient with sepsis , 2019, Life sciences.

[51]  Chang Ho Kim,et al.  Expression of Hypoxia-inducible Factor-1α and Vascular Endothelial Growth Factor in Colon Cancer: Relationship to the Prognosis and Tumor Markers , 2008 .

[52]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[53]  Chao Xing,et al.  A CRISPR screen identifies IFI6 as an ER-resident interferon effector that blocks flavivirus replication , 2018, Nature Microbiology.

[54]  T. Santhanam,et al.  BREAST CANCER DIAGNOSIS USING MACHINE LEARNING ALGORITHMS -A SURVEY , 2013 .

[55]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[56]  M. Sugeno,et al.  Non-monotonic fuzzy measures and the Choquet integral , 1994 .

[57]  Wang Zhenyuan,et al.  Asymptotic structural characteristics of fuzzy measure and their applications , 1983 .

[58]  Chia-Hua Ho,et al.  An improved GLMNET for l1-regularized logistic regression , 2011, J. Mach. Learn. Res..

[59]  Vinod Kumar Jain,et al.  Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification , 2018, Appl. Soft Comput..

[60]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[61]  Satoshi O. Suzuki,et al.  Upregulation of Annexin A1 in Reactive Astrocytes and Its Subtle Induction in Microglia at the Boundaries of Human Brain Infarcts. , 2019, Journal of neuropathology and experimental neurology.

[62]  I. Daubechies,et al.  Iteratively reweighted least squares minimization for sparse recovery , 2008, 0807.0575.

[63]  S. Horvath,et al.  Gene Expression Profiling of Gliomas Strongly Predicts Survival , 2004, Cancer Research.

[64]  Tingting Wang,et al.  Ensemble RBM-based classifier using fuzzy integral for big data classification , 2019, International Journal of Machine Learning and Cybernetics.

[65]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[66]  Shweta Kharya,et al.  Using data mining techniques for diagnosis and prognosis of cancer disease , 2012, ArXiv.

[67]  Muhammad Hisyam Lee,et al.  Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification , 2015, Expert Syst. Appl..

[68]  Kai Qu,et al.  MCM7 promotes cancer progression through cyclin D1-dependent signaling and serves as a prognostic marker for patients with hepatocellular carcinoma , 2017, Cell Death & Disease.

[69]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[70]  Xuzhi Zhang,et al.  IFI6 Inhibits Apoptosis via Mitochondrial-Dependent Pathway in Dengue Virus 2 Infected Vascular Endothelial Cells , 2015, PloS one.

[71]  K. Ma,et al.  Feature selection and classification of urinary mRNA microarray data by iterative random forest to diagnose renal fibrosis: a two-stage study , 2017, Scientific Reports.