Multiclass Sequential Feature Selection and Classification Method for Genomic Data

This paper presents an efficient multiclass sequential feature selection and classification (mk-SS) method using gene expression signatures. The development of this method employs 10-fold cross-validation to ensure stability. The efficiency of this method is assessed through the misclassification error rate and some other performance measures. The performances of the mk-SS were compared with the classification results of the Support Vector Machines (SVM) over five published multiclass microarray datasets. The results showed that the mk-SS method efficiently selects the informative gene biomarkers for proper classification of the biological groups of the tissue samples. This method competes favourably with SVM in terms of prediction accuracy while it outperforms the SVM in 80% of cases considered. The quality of the features selected by mk-SS algorithm was validated by hybridizing the feature selection scheme of the mk-SS into the standard SVM algorithm which significantly improves the predictive power of the standard SVM method. This work has shown that classification of various cancer type using gene expression profiles is feasible especially when the endpoints are of multi-category. Keywords: k-SS, mk-SS, Support Vector Machines, Microarray, Misclassification error rate

[1]  Mark R. Wade,et al.  Construction and Assessment of Classification Rules , 1999, Technometrics.

[2]  W. B. Yahya,et al.  Microarray-based Classification of Histopathologic Responses of Locally Advanced Rectal Carcinomas to Neoadjuvant Radiochemotherapy Treatment , 2014 .

[3]  C. Cooper,et al.  Applications of microarray technology in breast cancer research , 2001, Breast Cancer Research.

[4]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[5]  W. B. Yahya,et al.  Sequential Dimension Reduction and Prediction Methods with High-dimensional Microarray Data , 2009 .

[6]  Kristin P. Bennett,et al.  Support vector machines: hype or hallelujah? , 2000, SKDD.

[7]  Ludwig Fahrmeir,et al.  k-SS: a sequential feature selection and prediction method in Microarray study , 2011 .

[8]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  T. Pham,et al.  Analysis of Microarray Gene Expression Data , 2006 .

[10]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[11]  A. Godwin,et al.  Microarrays in cancer: research and applications. , 2003, BioTechniques.

[12]  Pamela J Green,et al.  Deep Sequencing of Chicken microRNAs , 2008, BMC Genomics.

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[15]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[16]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[17]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[18]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  D. Edwards,et al.  Statistical Analysis of Gene Expression Microarray Data , 2003 .

[20]  H. Nekarda,et al.  Neoadjuvant radiochemotherapy for patients with locally advanced rectal cancer leads to impairment of the anal sphincter , 2006, Journal of Gastrointestinal Surgery.

[21]  M. Mendelsohn,et al.  Myosin Phosphatase-Rho Interacting Protein , 2003, Journal of Biological Chemistry.

[22]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[24]  Torsten Hothorn,et al.  Random Forest variable importance with missing data , 2012 .

[25]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[26]  V. Vapnik Pattern recognition using generalized portrait method , 1963 .

[27]  P. Harper,et al.  A review and comparison of classification algorithms for medical decision making. , 2005, Health policy.

[28]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Daniel Q. Naiman,et al.  Simple decision rules for classifying human cancers from gene expression profiles , 2005, Bioinform..

[30]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .