Selection of patient samples and genes for outcome prediction

Gene expression profiles with clinical outcome data enable monitoring of disease progression and prediction of patient survival at the molecular level. We present a new computational method for outcome prediction. Our idea is to use an informative subset of original training samples. This subset consists of only short-term survivors who died within a short period and long-term survivors who were still alive after a long follow-up time. These extreme training samples yield a clear platform to identify genes whose expression is related to survival. To find relevant genes, we combine two feature selection methods - entropy measure and Wilcoxon rank sum test - so that a set of sharp discriminating features are identified. The selected training samples and genes are then integrated by a support vector machine to build a prediction model, by which each validation sample is assigned a survival/relapse risk score for drawing Kaplan-Meier survival curves. We apply this method to two data sets: diffuse large-B-cell lymphoma (DLBCL) and primary lung adenocarcinoma. In both cases, patients in high and low risk groups stratified by our risk scores are clearly distinguishable. We also compare our risk scores to some clinical factors, such as International Prognostic Index score for DLBCL analysis and tumor stage information for lung adenocarcinoma. Our results indicate that gene expression profiles combined with carefully chosen learning algorithms can predict patient survival for certain diseases.

[1]  Douglas G. Altman,et al.  Practical statistics for medical research , 1990 .

[2]  Lu Tian,et al.  Linking gene expression data with patient survival times using partial least squares , 2002, ISMB.

[3]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[4]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[5]  L. Staudt,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[6]  M. Tsao,et al.  Autotaxin expression in non-small-cell lung cancer. , 1999, American journal of respiratory cell and molecular biology.

[7]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[8]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9]  Ron S. Kenett,et al.  Statistics for Business and Economics , 1973 .

[10]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[11]  Charles A Powell,et al.  Non-small-cell lung cancer molecular signatures recapitulate lung developmental pathways. , 2003, The American journal of pathology.

[12]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[13]  Michael LeBlanc,et al.  Directed indices for exploring gene expression data , 2003, Bioinform..

[14]  Van,et al.  A gene-expression signature as a predictor of survival in breast cancer. , 2002, The New England journal of medicine.

[15]  Hiroyuki Honda,et al.  Selection of Causal Gene Sets from Transcriptional Profiling by FNN Modeling and Prediction of Lymphoma Outcome , 2002 .

[16]  Emili Montserrat,et al.  A predictive model for aggressive non-Hodgkin's lymphoma. , 1993, The New England journal of medicine.

[17]  David E. Misek,et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma , 2002, Nature Medicine.

[18]  Huiqing Liu,et al.  Mean-entropy discretized features are effective for classifying high-dimensional biomedical data , 2003, BIOKDD.

[19]  M Lunn,et al.  Applying Cox regression to competing risks. , 1995, Biometrics.

[20]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.