Computing Molecular Signatures as Optima of a Bi-Objective Function: Method and Application to Prediction in Oncogenomics

Background Filter feature selection methods compute molecular signatures by selecting subsets of genes in the ranking of a valuation function. The motivations of the valuation functions choice are almost always clearly stated, but those for selecting the genes according to their ranking are hardly ever explicit. Method We addressed the computation of molecular signatures by searching the optima of a bi-objective function whose solution space was the set of all possible molecular signatures, ie, the set of subsets of genes. The two objectives were the size of the signature-to be minimized–and the interclass distance induced by the signature-to be maximized–. Results We showed that: 1) the convex combination of the two objectives had exactly n optimal non empty signatures where n was the number of genes, 2) the n optimal signatures were nested, and 3) the optimal signature of size k was the subset of k top ranked genes that contributed the most to the interclass distance. We applied our feature selection method on five public datasets in oncology, and assessed the prediction performances of the optimal signatures as input to the diagonal linear discriminant analysis (DLDA) classifier. They were at the same level or better than the best-reported ones. The predictions were robust, and the signatures were almost always significantly smaller. We studied in more details the performances of our predictive modeling on two breast cancer datasets to predict the response to a preoperative chemotherapy: the performances were higher than the previously reported ones, the signatures were three times smaller (11 versus 30 gene signatures), and the genes member of the signature were known to be involved in the response to chemotherapy. Conclusions Defining molecular signatures as the optima of a bi-objective function that combined the signature size and the interclass distance was well founded and efficient for prediction in oncogenomics. The complexity of the computation was very low because the optimal signatures were the sets of genes in the ranking of their valuation. Software can be freely downloaded from http://gardeux-vincent.eu/DeltaRanking.php

[1]  Chun Jing,et al.  Tazarotene-induced gene 1 (TIG1) expression in prostate carcinomas and its relationship to tumorigenicity. , 2002, Journal of the National Cancer Institute.

[2]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3]  Mitch Dowsett,et al.  ESR1 Is Co-Expressed with Closely Adjacent Uncharacterised Genes Spanning a Breast Cancer Susceptibility Locus at 6q25.1 , 2011, PLoS genetics.

[4]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[5]  B. Vojtesek,et al.  The pro-metastatic protein anterior gradient-2 predicts poor prognosis in tamoxifen-treated breast cancers , 2010, Oncogene.

[6]  H. Caussinus,et al.  Journal De La Société Française De Statistique , 2006 .

[7]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[8]  Johan A. K. Suykens,et al.  Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction , 2004, Bioinform..

[9]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[10]  Yuan Qi,et al.  Evaluation of a 30-Gene Paclitaxel, Fluorouracil, Doxorubicin, and Cyclophosphamide Chemotherapy Response Predictor in a Multicenter Randomized Trial in Breast Cancer , 2010, Clinical Cancer Research.

[11]  A Howell,et al.  Effects of oestrogen on gene expression in epithelium and stroma of normal human breast tissue. , 2006, Endocrine-related cancer.

[12]  Sivanesan Dakshanamurthy,et al.  Tumor suppressor RARRES1 interacts with cytoplasmic carboxypeptidase AGBL2 to regulate the α-tubulin tyrosination cycle. , 2011, Cancer research.

[13]  Fred W. Glover,et al.  Unidimensional Search for Solving Continuous High-Dimensional Optimization Problems , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[14]  Fred W. Glover,et al.  EM323: a line search based algorithm for solving high-dimensional continuous non-linear optimization problems , 2011, Soft Comput..

[15]  M. Radmacher,et al.  Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. , 2003, Journal of the National Cancer Institute.

[16]  Yingdong Zhao,et al.  Analysis of Gene Expression Data Using BRB-Array Tools , 2007, Cancer informatics.

[17]  Christopher G. Chute,et al.  Cancer Informatics , 2002, Health Informatics.

[18]  Yuan Qi,et al.  Lack of sufficiently strong informative features limits the potential of gene expression analysis as predictive tool for many clinical classification problems , 2011, BMC Bioinformatics.

[19]  Jin-Kao Hao,et al.  Advances in metaheuristics for gene selection and classification of microarray data , 2010, Briefings Bioinform..

[20]  Jean-Philippe Vert,et al.  The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures , 2011, PloS one.

[21]  Debora Fumagalli,et al.  Estrogen receptor (ESR1) mRNA expression and benefit from tamoxifen in the treatment and prevention of estrogen receptor-positive breast cancer. , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[22]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[23]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[24]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[25]  J. Ross,et al.  Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[26]  E. Lander,et al.  A molecular signature of metastasis in primary solid tumors , 2003, Nature Genetics.

[27]  J. M. Deutsch,et al.  Evolutionary algorithms for finding optimal gene sets in microarray prediction , 2003, Bioinform..

[28]  Carlotta Orsenigo,et al.  Gene Selection and Cancer Microarray Data Classification Via Mixed-Integer Optimization , 2008, EvoBIO.

[29]  A. Dupuy,et al.  Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. , 2007, Journal of the National Cancer Institute.

[30]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[31]  A Y Yakovlev,et al.  Variable selection and pattern recognition with gene expression data generated by the microarray technology. , 2002, Mathematical biosciences.

[32]  B. Ghattas,et al.  Sélection de variables pour la classification binaire en grande dimension: comparaisons et application aux domées de biopuces , 2008 .

[33]  Andrew Kusiak,et al.  Cancer gene search with data-mining and genetic algorithms , 2007, Comput. Biol. Medicine.

[34]  René Natowicz,et al.  Prediction of the outcome of preoperative chemotherapy in breast cancer using DNA probes that provide information on both complete and incomplete responses , 2008, BMC Bioinformatics.

[35]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[36]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[37]  Beatriz de la Iglesia,et al.  Clustering Rules: A Comparison of Partitioning and Hierarchical Clustering Algorithms , 2006, J. Math. Model. Algorithms.

[38]  P. Hall,et al.  An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[40]  Sandra Sernbo,et al.  The tumour suppressor SOX11 is associated with improved survival among high grade epithelial ovarian cancers and is regulated by reversible promoter methylation , 2011, BMC Cancer.

[41]  Guy N Brock,et al.  Interrogating differences in expression of targeted gene sets to predict breast cancer outcome , 2013, BMC Cancer.