The classification of cancer stage microarray data

Correctly diagnosing the cancer stage is most important for selecting an appropriate cancer treatment option for a patient. Recent advances in microarray technology allow the cancer stage to be predicted using gene expression patterns. The cancer stage is in ordinal scale. In this paper, we employ strict ordinal regressions including cumulative logit model in traditional statistics with data dimensionality reduction, and distribution free approaches of large margin rank boundaries implemented by the support vector machine, as well as an ensemble ranking scheme to model the cancer stage using gene expression microarray data. Predictive genes included in models are selected by univariate feature ranking, and recursive feature elimination. We perform cross-validation experiments to assess and compare classification accuracies of ordinal and non-ordinal algorithms on five cancer stage microarray datasets. We conclude that a strict ordinal classifier trained by a validated approach can predict the cancer stage more accurately than traditional non-ordinal classifiers without considering the order of cancer stages.

[1]  Amnon Shashua,et al.  Ranking with Large Margin Principle: Two Approaches , 2002, NIPS.

[2]  Ralf Herbrich,et al.  Large margin rank boundaries for ordinal regression , 2000 .

[3]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[4]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[5]  Leroy Hood,et al.  A molecular correlate to the Gleason grading system for prostate adenocarcinoma. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[6]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  K. Archer,et al.  L 1 penalized continuation ratio models for ordinal response prediction using high‐dimensional datasets , 2012, Statistics in medicine.

[8]  Torben F. Ørntoft,et al.  Identifying distinct classes of bladder carcinoma using microarrays , 2003, Nature Genetics.

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[11]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[12]  Hanqing Lu,et al.  A practical SVM-based algorithm for ordinal regression in image retrieval , 2003, MULTIMEDIA '03.

[13]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[14]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[15]  Kathleen R. Cho,et al.  Mouse model of human ovarian endometrioid adenocarcinoma based on somatic defects in the Wnt/beta-catenin and PI3K/Pten signaling pathways. , 2007, Cancer cell.

[16]  M. Ringnér,et al.  Prediction of Stage, Grade, and Survival in Bladder Cancer Using Genome-wide Expression Data: A Validation Study , 2010, Clinical Cancer Research.

[17]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[18]  Xin Zhou,et al.  MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data , 2007, Bioinform..

[19]  Wei Chu,et al.  New approaches to support vector ordinal regression , 2005, ICML.

[20]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[21]  Tak-Hong Cheung,et al.  Expression genomics of cervical cancer: molecular classification and prediction of radiotherapy response by DNA microarray. , 2003, Clinical cancer research : an official journal of the American Association for Cancer Research.

[22]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[23]  Shuta Tomida,et al.  Gene expression-based, individualized outcome prediction for surgically treated lung cancer patients , 2004, Oncogene.

[24]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[25]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[26]  Lodewyk F. A. Wessels,et al.  A Critical Evaluation of Network and Pathway-Based Classifiers for Outcome Prediction in Breast Cancer , 2011, PloS one.

[27]  D. Kleinbaum,et al.  Regression models for ordinal responses: a review of methods and applications. , 1997, International journal of epidemiology.