Learning a Weighted Meta-Sample Based Parameter Free Sparse Representation Classification for Microarray Data

Sparse representation classification (SRC) is one of the most promising classification methods for supervised learning. This method can effectively exploit discriminating information by introducing a regularization terms to the data. With the desirable property of sparisty, SRC is robust to both noise and outliers. In this study, we propose a weighted meta-sample based non-parametric sparse representation classification method for the accurate identification of tumor subtype. The proposed method includes three steps. First, we extract the weighted meta-samples for each sub class from raw data, and the rationality of the weighting strategy is proven mathematically. Second, sparse representation coefficients can be obtained by regularization of underdetermined linear equations. Thus, data dependent sparsity can be adaptively tuned. A simple characteristic function is eventually utilized to achieve classification. Asymptotic time complexity analysis is applied to our method. Compared with some state-of-the-art classifiers, the proposed method has lower time complexity and more flexibility. Experiments on eight samples of publicly available gene expression profile data show the effectiveness of the proposed method.

[1]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[2]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[3]  Kun Zhou,et al.  Locality Sensitive Discriminant Analysis , 2007, IJCAI.

[4]  John Wright,et al.  Computation and Relaxation of Conditions for Equivalence between ` 1 and ` 0 Minimization ∗ , 2008 .

[5]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[6]  E. Lander,et al.  MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia , 2002, Nature Genetics.

[7]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[8]  Wei Jia,et al.  Robust Classification Method of Tumor Subtype by Using Correlation Filters , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[10]  Shuicheng Yan,et al.  Learning With $\ell ^{1}$-Graph for Image Analysis , 2010, IEEE Transactions on Image Processing.

[11]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[12]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[17]  HuangDe-Shuang,et al.  Robust Classification Method of Tumor Subtype by Using Correlation Filters , 2012 .

[18]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[19]  Xiaoxu Han,et al.  Nonnegative Principal Component Analysis for Cancer Molecular Pattern Discovery , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[21]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[22]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[23]  Lei Zhang,et al.  Tumor Classification Based on Non-Negative Matrix Factorization Using Gene Expression Data , 2011, IEEE Transactions on NanoBioscience.

[24]  D. Donoho,et al.  Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[25]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[26]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[27]  Richard Simon,et al.  A random variance model for detection of differential gene expression in small microarray experiments , 2003, Bioinform..

[28]  Simon C. K. Shiu,et al.  Molecular Pattern Discovery Based on Penalized Matrix Decomposition , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  StatnikovAlexander,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2005 .

[30]  Masashi Sugiyama,et al.  Local Fisher discriminant analysis for supervised dimensionality reduction , 2006, ICML.

[31]  Simon C. K. Shiu,et al.  Metasample-Based Sparse Representation for Tumor Classification , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  Hugues Bersini,et al.  A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[33]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[34]  Lei Zhang,et al.  Tumor Clustering Using Nonnegative Matrix Factorization With Gene Selection , 2009, IEEE Transactions on Information Technology in Biomedicine.

[35]  Bin Fang,et al.  Large Margin Subspace Learning for feature selection , 2013, Pattern Recognit..