An efficient SVM based tumor classification with symmetry Non-negative Matrix Factorization using gene expression data

A reliable and accurate identification of the type of tumors is crucial to the proper treatment of cancers. The classification of tumors was and is both a practical and theoretic necessity and requirement. DNA microarrays provide a new technique of measuring gene expression, which has attracted a lot of research interest in recent years. It was suggested that gene expression data from microarrays (biochips) can be employed in many biomedical areas, e.g., in cancer classification. Although several, new and existing, methods of classification were tested, a selection of proper (optimal) set of genes, the expressions of which can serve during classification, is still an open problem. This paper presents a new method for tumor classification using gene expression data. In the proposed method, we first select genes using Nonnegative Matrix Factorization (NMF). In order to improve the performance of classification, Symmetry NMF (SymNMF) is used in this approach. Then, features are extracted from the selected genes by virtue SymNMF. As a last step, an efficient machine learning approach is used to classify the tumor samples using the extracted features. In order for a better classification, Support Vector Machine with Weighted Kernel Width (WSVM) is used in this classification approach. The performance of the proposed approach is tested using colon cancer data set and the acute leukemia data set. It is observed from the experimental results that the proposed approach provides better performance when compared with the traditional approaches.

[1]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[2]  J. Ioannidis,et al.  Predictive ability of DNA microarrays for cancer outcomes and correlates: an empirical assessment , 2003, The Lancet.

[3]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[4]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[5]  De-Shuang Huang,et al.  Independent component analysis-based penalized discriminant method for tumor classification using gene expression data , 2006, Bioinform..

[6]  Johan A. K. Suykens,et al.  Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction , 2004, Bioinform..

[7]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[8]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Juha Reunanen,et al.  Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..

[10]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[11]  Xiao Liu,et al.  A Novel Representation Approach to DNA Sequence and Its Application , 2009, IEEE Signal Processing Letters.

[12]  R. Plemmons,et al.  On reduced rank nonnegative matrix factorization for symmetric nonnegative matrices , 2004 .

[13]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[14]  Simon J. Godsill,et al.  Bayesian Image Modeling of cDNA Microarray Spots , 2007, IEEE Signal Processing Letters.

[15]  J. G. Liao,et al.  Logistic regression for disease classification using microarray data: model selection in a large p and small n case , 2007, Bioinform..

[16]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[17]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[18]  Stan Z. Li,et al.  Learning spatially localized, parts-based representation , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[19]  Yuan Gao,et al.  Improving molecular cancer class discovery through sparse non-negative matrix factorization , 2005 .

[20]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Lei Zhang,et al.  Tumor Classification Based on Non-Negative Matrix Factorization Using Gene Expression Data , 2011, IEEE Transactions on NanoBioscience.

[22]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[23]  L. Carin,et al.  Sequential modeling for identifying CpG island locations in human genome , 2002, IEEE Signal Processing Letters.

[24]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..