A method of gene expression data transfer from cell lines to cancer patients for machine-learning prediction of drug efficiency

ABSTRACT Personalized medicine implies that distinct treatment methods are prescribed to individual patients according several features that may be obtained from, e.g., gene expression profile. The majority of machine learning methods suffer from the deficiency of preceding cases, i.e. the gene expression data on patients combined with the confirmed outcome of known treatment methods. At the same time, there exist thousands of various cell lines that were treated with hundreds of anti-cancer drugs in order to check the ability of these drugs to stop the cell proliferation, and all these cell line cultures were profiled in terms of their gene expression. Here we present a new approach in machine learning, which can predict clinical efficiency of anti-cancer drugs for individual patients by transferring features obtained from the expression-based data from cell lines. The method was validated on three datasets for cancer-like diseases (chronic myeloid leukemia, as well as lung adenocarcinoma and renal carcinoma) treated with targeted drugs – kinase inhibitors, such as imatinib or sorafenib.

[1]  Diane D. Liu,et al.  ETS2 Mediated Tumor Suppressive Function and MET Oncogene Inhibition in Human Non–Small Cell Lung Cancer , 2013, Clinical Cancer Research.

[2]  A. Zhavoronkov,et al.  Bioinformatics Meets Biomedicine: OncoFinder, a Quantitative Approach for Interrogating Molecular Pathways Using Gene Expression Data. , 2017, Methods in molecular biology.

[3]  Misao Ohki,et al.  Identification of a gene expression signature associated with pediatric AML prognosis. , 2003, Blood.

[4]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[5]  John Shawe-Taylor,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[6]  Yi-Ching Hsieh,et al.  In chronic myeloid leukemia white cells from cytogenetic responders and non-responders to imatinib have very similar gene expression signatures. , 2005, Haematologica.

[7]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[8]  Michael Peyton,et al.  Comprehensive Biomarker Analysis and Final Efficacy Results of Sorafenib in the BATTLE Trial , 2013, Clinical Cancer Research.

[9]  A. Aliper,et al.  Mathematical Justification of Expression-Based Pathway Activation Scoring (PAS). , 2017, Methods in molecular biology.

[10]  Nicolas Borisov,et al.  A method for predicting target drug efficiency in cancer based on the analysis of signaling pathway activation , 2015, Oncotarget.

[11]  Markus Müller,et al.  Bioinformatics for protein biomarker panel classification: what is needed to bring biomarker panels into in vitro diagnostics? , 2009, Expert review of proteomics.

[12]  Thomas Lengauer,et al.  Classification with correlated features: unreliability of feature ranking and solutions , 2011, Bioinform..

[13]  M. Bhasin,et al.  Bioinformatic identification and characterization of human endothelial cell-restricted genes , 2010, BMC Genomics.

[14]  Faramarz Valafar,et al.  Empirical comparison of cross-platform normalization methods for gene expression data , 2011, BMC Bioinformatics.

[15]  Nicholas J. Wang,et al.  Comparative analyses of gene copy number and mRNA expression in glioblastoma multiforme tumors and xenografts. , 2009, Neuro-oncology.

[16]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[17]  J. Karlsson,et al.  Clear cell sarcoma of the kidney demonstrates an embryonic signature indicative of a primitive nephrogenic origin , 2014, Genes, chromosomes & cancer.

[18]  Andrew B. Nobel,et al.  Merging two gene-expression studies via cross-platform normalization , 2008, Bioinform..

[19]  Anthony Boral,et al.  Gene expression profiling and correlation with outcome in clinical trials of the proteasome inhibitor bortezomib. , 2006, Blood.

[20]  Y. Cheng,et al.  Relationship between the inhibition constant (K1) and the concentration of inhibitor which causes 50 per cent inhibition (I50) of an enzymatic reaction. , 1973, Biochemical pharmacology.

[21]  N. Kuzmina,et al.  Handling Complex Rule-Based Models of Mitogenic Cell Signaling (on the Example of ERK Activation upon EGF Stimulation) , 2011 .

[22]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[23]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[24]  Qi Wang,et al.  Screening of feature genes in distinguishing different types of breast cancer using support vector machine , 2015, OncoTargets and therapy.

[25]  Nikolay M. Borisov,et al.  Oncofinder, a new method for the analysis of intracellular signaling pathway activation using transcriptomic data , 2014, Front. Genet..

[26]  B. Mishra,et al.  Data Aggregation at the Level of Molecular Pathways Improves Stability of Experimental Transcriptomic and Proteomic Data , 2016, bioRxiv.

[27]  Sridhar Ramaswamy,et al.  Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells , 2012, Nucleic Acids Res..