A Novel Gene Selection Algorithm based on Sparse Representation and Minimum-redundancy Maximum-relevancy of Maximum Compatibility Center

Tumor classification is important for accurate diagnosis and personalized treatment and has recently received great attention. Analysis of gene expression profile has shown relevant biological significance and thus has become a research hotspot and a new challenge for bio-data mining. In the research methods, some algorithms can identify few genes but with great time complexity, some algorithms can get small time complex methods but with unsatisfactory classification accuracy, this article proposed a new extraction method for gene expression profile. In this paper, we propose a classification method for tumor subtypes based on the Minimum- Redundancy Maximum-Relevancy (MRMR) of maximum compatibility center. First, we performed a fuzzy clustering of gene expression profiles based on the compatibility relation. Next, we used the sparse representation coefficient to assess the importance of the gene for the category, extracted the top-ranked genes, and removed the uncorrelated genes. Finally, the MRMR search strategy was used to select the characteristic gene, reject the redundant gene, and obtain the final subset of characteristic genes. Our method and four others were tested on four different datasets to verify its effectiveness. Results show that the classification accuracy and standard deviation of our method are better than those of other methods. Our proposed method is robust, adaptable, and superior in classification. This method can help us discover the susceptibility genes associated with complex diseases and understand the interaction between these genes. Our technique provides a new way of thinking and is important to understand the pathogenesis of complex diseases and prevent diseases, diagnosis and treatment.

[1]  Lin Sun,et al.  A Gene selection approach based on the fisher linear discriminant and the neighborhood rough set , 2017, Bioengineered.

[2]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[3]  Min Chen,et al.  A Novel Gene Selection Method Based on Sparse Representation and Max-Relevance and Min-Redundancy. , 2017, Combinatorial chemistry & high throughput screening.

[4]  Yu-Dong Cai,et al.  Prediction of carbamylated lysine sites based on the one-class k-nearest neighbor method. , 2013, Molecular bioSystems.

[5]  G. W. Hatfield,et al.  Global Gene Expression Profiling in Escherichia coliK12 , 2000, The Journal of Biological Chemistry.

[6]  Chao Hao Selection of Feature Genes in Cancer Clsssification , 2007 .

[7]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[8]  William H. Hsu,et al.  Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning , 2004, Inf. Sci..

[9]  Gavin C. Cawley,et al.  Gene Selection in Cancer Classification using Sparse Logistic Regression with Bayesian Regularisation , 2006 .

[10]  Guohua Huang,et al.  Alignment-free comparison of genome sequences by a new numerical characterization. , 2011, Journal of theoretical biology.

[11]  Jie Gui,et al.  Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction , 2010, Comput. Biol. Medicine.

[12]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[13]  G. W. Hatfield,et al.  Global gene expression profiling in Escherichia coli K12. The effects of integration host factor. , 2000, The Journal of biological chemistry.

[14]  H. Stefánsson,et al.  Genetics of gene expression and its effect on disease , 2008, Nature.

[15]  Yuchao Zhang,et al.  Prediction of S-Nitrosylation Modification Sites Based on Kernel Sparse Representation Classification and mRMR Algorithm , 2014, BioMed research international.

[16]  Xiaowei Yang,et al.  A new hybrid method for gene selection , 2011, Pattern Analysis and Applications.

[17]  Guohua Huang,et al.  A Novel Neighborhood Model to Predict Protein Function from Protein- Protein Interaction Data , 2014 .

[18]  Jincheng Li,et al.  Feature Extractions for Computationally Predicting Protein Post- Translational Modifications , 2017, Current Bioinformatics.

[19]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[20]  Jing Yin,et al.  Artificial neural networks and gene filtering distinguish between global gene expression profiles of Barrett's esophagus and esophageal cancer. , 2002, Cancer research.

[21]  Prashant Sharma,et al.  Optimal Reference Gene Selection for Expression Studies in Human Reticulocytes. , 2018, The Journal of molecular diagnostics : JMD.

[22]  N. Ramaraj,et al.  A novel hybrid feature selection via Symmetrical Uncertainty ranking based local memetic search algorithm , 2010, Knowl. Based Syst..

[23]  Tianzi Jiang,et al.  A combinational feature selection and ensemble neural network method for classification of gene expression data , 2004, BMC Bioinformatics.

[24]  Shu-Lin Wang,et al.  Molecular cancer classification using a meta-sample-based regularized robust coding method , 2014, BMC Bioinformatics.

[25]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[26]  M. Ko,et al.  Genome-wide expression profiling of mid-gestation placenta and embryo using a 15,000 mouse developmental cDNA microarray. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Jiucheng Xu,et al.  Feature Genes Selection Using Supervised Locally Linear Embedding and Correlation Coefficient for Microarray Classification , 2018, Comput. Math. Methods Medicine.

[28]  Xiaoming Huo,et al.  Uncertainty principles and ideal atomic decomposition , 2001, IEEE Trans. Inf. Theory.

[29]  Brian E. Howard,et al.  A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics , 2018, PloS one.

[30]  Fang-Xiang Wu,et al.  Sparse Representation for Classification of Tumors Using Gene Expression Data , 2009, Journal of biomedicine & biotechnology.

[31]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[32]  Wei Jia,et al.  Robust Classification Method of Tumor Subtype by Using Correlation Filters , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[33]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[34]  S. Shurtleff,et al.  Clinical utility of microarray-based gene expression profiling in the diagnosis and subclassification of leukemia: report from the International Microarray Innovations in Leukemia Study Group. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[35]  Xiaobo Zhou,et al.  A Bayesian approach to nonlinear probit gene selection and classification , 2004, J. Frankl. Inst..

[36]  Xiong Li,et al.  A fast and exhaustive method for heterogeneity and epistasis analysis based on multi‐objective optimization , 2017, Bioinform..

[37]  E. Lander,et al.  MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia , 2002, Nature Genetics.

[38]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[39]  M. Xiong,et al.  Biomarker Identification by Feature Wrappers , 2022 .

[40]  L. Zadeh Fuzzy sets as a basis for a theory of possibility , 1999 .

[41]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[43]  Yukyee Leung,et al.  A Multiple-Filter-Multiple-Wrapper Approach to Gene Selection and Microarray Data Classification , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[44]  Simon C. K. Shiu,et al.  Metasample-Based Sparse Representation for Tumor Classification , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[45]  Jun Zhang,et al.  Sparse Representation for Tumor Classification Based on Feature Extraction Using Latent Low-Rank Representation , 2014, BioMed research international.

[46]  Yanyong Guan,et al.  Set-valued information systems , 2006, Inf. Sci..

[47]  J. Stuart Aitken,et al.  Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes , 2005, BMC Bioinformatics.

[48]  Anil Rai,et al.  Statistical approach for selection of biologically informative genes. , 2018, Gene.

[49]  Lotfi A. Zadeh,et al.  Toward a generalized theory of uncertainty (GTU)--an outline , 2005, Inf. Sci..

[50]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[51]  Jaakko Astola,et al.  On the Use of MDL Principle in Gene Expression Prediction , 2001, EURASIP J. Adv. Signal Process..