Utilization of ordinal response structures in classification with high-dimensional expression data

Molecular diagnosis or prediction of clinical treatment outcome based on high-throughput genomics data is a modern application of machine learning techniques for clinical problems. In practice, clinical parameters, such as patient health status or toxic reaction to therapy, are often measured on an ordinal scale (e.g. good, fair, poor). Commonly, the prediction of ordinal end-points is treated as a multi-class classification problem, disregarding the ordering information contained in the response. This may result in a loss of prediction accuracy. Classical approaches to model ordinal response directly, including for instance the cumulative logit model, are typically not applicable to high-dimensional data. We present hierarchical twoing (hi2), a novel algorithm for classification of high-dimensional data into ordered categories. hi2 combines the power of well-understood binary classification with ordinal response prediction. A comparison of several approaches for ordinal classification on real world data as well as simulated data shows that classification algorithms especially designed to handle ordered categories fail to improve upon state-of-the-art non-ordinal classification algorithms. In general, the classification performance of an algorithm is dominated by its ability to deal with the high-dimensionality of the data. Only hi2 outperforms its competitors in the case of moderate effects.

[1]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[2]  Pedro Antonio Gutiérrez,et al.  A preliminary study of ordinal metrics to guide a multi-objective evolutionary algorithm , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[3]  Maria Keays,et al.  ArrayExpress update—trends in database growth and links to data analysis tools , 2012, Nucleic Acids Res..

[4]  Klaus Hechenbichler,et al.  Weighted k-Nearest-Neighbor Techniques and Ordinal Classification , 2004 .

[5]  Agreement Between Two Ratings with Different Ordinal Scales , 2007 .

[6]  I. Ellis,et al.  Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. , 2002, Histopathology.

[7]  Dan Davison,et al.  A Multi-Language Computing Environment for Literate Programming and Reproducible Research , 2012 .

[8]  W. R. Buckland Elements of Nonparametric Statistics , 1967 .

[9]  Patrick Warnat,et al.  Customized oligonucleotide microarray gene expression-based classification of neuroblastoma patients outperforms current clinical risk stratification. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[10]  M. Kendall Rank Correlation Methods , 1949 .

[11]  Walter Krämer,et al.  Review of Modern applied statistics with S, 4th ed. by W.N. Venables and B.D. Ripley. Springer-Verlag 2002 , 2003 .

[12]  T. Pajak,et al.  Toxicity criteria of the Radiation Therapy Oncology Group (RTOG) and the European Organization for Research and Treatment of Cancer (EORTC) , 1995, International journal of radiation oncology, biology, physics.

[13]  Giuliano Galimberti,et al.  Classification Trees for Ordinal Responses in R: The rpartScore Package , 2012 .

[14]  Jun Yu Li,et al.  Reliability and validity of the CMT neuropathy score as a measure of disability , 2005, Neurology.

[15]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[16]  A. Ishizu,et al.  Prediction of Response to Treatment by Gene Expression Profiling of Peripheral Blood in Patients with Microscopic Polyangiitis , 2013, PloS one.

[17]  Carme Camps,et al.  microRNA-associated progression pathways and potential therapeutic targets identified by integrated mRNA and microRNA expression profiling in breast cancer. , 2011, Cancer research.

[18]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[19]  Kellie J Archer,et al.  rpartOrdinal: An R Package for Deriving a Classification Tree for Predicting an Ordinal Response. , 2010, Journal of statistical software.

[20]  Kristin P. Bennett,et al.  Support vector machines: hype or hallelujah? , 2000, SKDD.

[21]  H. Christiansen,et al.  High-Grade Acute Organ Toxicity as Positive Prognostic Factor in Primary Radio(chemo)therapy for Locally Advanced, Inoperable Head and Neck Cancer , 2010, Strahlentherapie und Onkologie.

[22]  J Pritchard,et al.  Revisions of the international criteria for neuroblastoma diagnosis, staging, and response to treatment. , 1993, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[23]  Jaime S. Cardoso,et al.  Measuring the Performance of Ordinal Classification , 2011, Int. J. Pattern Recognit. Artif. Intell..

[24]  Tsung-Cheng Chang,et al.  microRNAs in vertebrate physiology and human disease. , 2007, Annual review of genomics and human genetics.

[25]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[26]  R. Simon,et al.  Effectiveness of gene expression profiling for response prediction of rectal adenocarcinomas to preoperative chemoradiotherapy. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[27]  Sandrine Dudoit,et al.  Classification in microarray experiments , 2003 .

[28]  Eibe Frank,et al.  A Simple Approach to Ordinal Classification , 2001, ECML.

[29]  K. Archer,et al.  L 1 penalized continuation ratio models for ordinal response prediction using high‐dimensional datasets , 2012, Statistics in medicine.

[30]  D. Kleinbaum,et al.  Regression models for ordinal responses: a review of methods and applications. , 1997, International journal of epidemiology.

[31]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[32]  Wei Chu,et al.  Support Vector Ordinal Regression , 2007, Neural Computation.