LJELSR: A Strengthened Version of JELSR for Feature Selection and Clustering

Feature selection and sample clustering play an important role in bioinformatics. Traditional feature selection methods separate sparse regression and embedding learning. Later, to effectively identify the significant features of the genomic data, Joint Embedding Learning and Sparse Regression (JELSR) is proposed. However, since there are many redundancy and noise values in genomic data, the sparseness of this method is far from enough. In this paper, we propose a strengthened version of JELSR by adding the L1-norm constraint on the regularization term based on a previous model, and call it LJELSR, to further improve the sparseness of the method. Then, we provide a new iterative algorithm to obtain the convergence solution. The experimental results show that our method achieves a state-of-the-art level both in identifying differentially expressed genes and sample clustering on different genomic data compared to previous methods. Additionally, the selected differentially expressed genes may be of great value in medical research.

[1]  Ronghua Shang,et al.  Non-Negative Spectral Learning and Sparse Regression-Based Dual-Graph Regularized Feature Selection , 2018, IEEE Transactions on Cybernetics.

[2]  JinHua Xu,et al.  Web user clustering analysis based on KMeans algorithm , 2010, 2010 International Conference on Information, Networking and Automation (ICINA).

[3]  Annarita D'Addabbo,et al.  SVD Based Feature Selection and Sample Classification of Proteomic Data , 2008, KES.

[4]  Bernhard Schölkopf,et al.  A Local Learning Approach for Clustering , 2006, NIPS.

[5]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Jiawei Han,et al.  Spectral Regression for Efficient Regularized Subspace Learning , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Feiping Nie,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Feature Selection via Joint Embedding Learning and Sparse Regression , 2022 .

[8]  Chiara Sabatti,et al.  Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[9]  M. Jonkman,et al.  Mutations in KRT5 and KRT14 cause epidermolysis bullosa simplex in 75% of the patients , 2011, The British journal of dermatology.

[10]  Ivor W. Tsang,et al.  Spectral Embedded Clustering: A Framework for In-Sample and Out-of-Sample Spectral Clustering , 2011, IEEE Transactions on Neural Networks.

[11]  Shannon L. Risacher,et al.  Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance , 2011, 2011 International Conference on Computer Vision.

[12]  Xuelong Li,et al.  Joint Embedding Learning and Sparse Regression: A Framework for Unsupervised Feature Selection , 2014, IEEE Transactions on Cybernetics.

[13]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[14]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Deyu Meng,et al.  A recursive divide-and-conquer approach for sparse principal component analysis , 2012, ArXiv.

[17]  B. Seed,et al.  Molecular cloning of two CD7 (T‐cell leukemia antigen) cDNAs by a COS cell expression system. , 1987, The EMBO journal.

[18]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[19]  Jiasheng Wang,et al.  ACTB in cancer. , 2013, Clinica chimica acta; international journal of clinical chemistry.

[20]  Claus Lindbjerg Andersen,et al.  Normalization of Real-Time Quantitative Reverse Transcription-PCR Data: A Model-Based Variance Estimation Approach to Identify Genes Suited for Normalization, Applied to Bladder and Colon Cancer Data Sets , 2004, Cancer Research.

[21]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[22]  Chung-Wu Lin,et al.  Loss of CD7, independent of galectin‐3 expression, implies a worse prognosis in adult T‐cell leukaemia/lymphoma , 2009, Histopathology.

[23]  Rob Pieters,et al.  Duplication of the MYB oncogene in T cell acute lymphoblastic leukemia , 2007, Nature Genetics.

[24]  Lei Wang,et al.  Efficient Spectral Feature Selection with Minimum Redundancy , 2010, AAAI.

[25]  Magdalena Nowakowska,et al.  Diverse effect of WWOX overexpression in HT29 and SW480 colon cancer cell lines , 2014, Tumor Biology.

[26]  A. Riggs,et al.  Genomic sequencing. , 1993, Methods in molecular biology.

[27]  Aristidis Likas,et al.  Bayesian feature and model selection for Gaussian mixture models , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[29]  R. Kratzke,et al.  ERBB2 amplifications in esophageal adenocarcinoma. , 2004, The Annals of thoracic surgery.