Integrative and Personalized QSAR Analysis in Cancer by Kernelized Bayesian Matrix Factorization

With data from recent large-scale drug sensitivity measurement campaigns, it is now possible to build and test models predicting responses for more than one hundred anticancer drugs against several hundreds of human cancer cell lines. Traditional quantitative structure-activity relationship (QSAR) approaches focus on small molecules in searching for their structural properties predictive of the biological activity in a single cell line or a single tissue type. We extend this line of research in two directions: (1) an integrative QSAR approach predicting the responses to new drugs for a panel of multiple known cancer cell lines simultaneously and (2) a personalized QSAR approach predicting the responses to new drugs for new cancer cell lines. To solve the modeling task, we apply a novel kernelized Bayesian matrix factorization method. For maximum applicability and predictive performance, the method optionally utilizes genomic features of cell lines and target information on drugs in addition to chemical drug descriptors. In a case study with 116 anticancer drugs and 650 cell lines, we demonstrate the usefulness of the method in several relevant prediction scenarios, differing in the amount of available information, and analyze the importance of various types of drug features for the response prediction. Furthermore, after predicting the missing values of the data set, a complete global map of drug response is explored to assess treatment potential and treatment range of therapeutically interesting anticancer drugs.

[1]  S. Cabaniss,et al.  Quantitative structure-property relationship for predicting chlorine demand by organic molecules. , 2010, Environmental science & technology.

[2]  H. Kubinyi Comparative Molecular Field Analysis (CoMFA) , 2002 .

[3]  Paola Gramatica,et al.  Statistically Validated QSARs, Based on Theoretical Descriptors, for Modeling Aquatic Toxicity of Organic Chemicals in Pimephales promelas (Fathead Minnow) , 2005, J. Chem. Inf. Model..

[4]  Nci Dream Community A community effort to assess and improve drug sensitivity prediction algorithms , 2014 .

[5]  S. Ramaswamy,et al.  Systematic identification of genomic markers of drug sensitivity in cancer cells , 2012, Nature.

[6]  S. Pickett,et al.  GRid-INdependent descriptors (GRIND): a novel class of alignment-independent three-dimensional molecular descriptors. , 2000, Journal of medicinal chemistry.

[7]  Gordon M. Crippen,et al.  Data Mining the NCI60 to Predict Generalized Cytotoxicity. , 2008 .

[8]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[9]  W. Tong,et al.  Quantitative structure‐activity relationship methods: Perspectives on drug discovery and toxicology , 2003, Environmental toxicology and chemistry.

[10]  Bo-Han Su,et al.  Dependence of QSAR Models on the Selection of Trial Descriptor Sets: A Demonstration Using Nanotoxicity Endpoints of Decorated Nanotubes , 2013, J. Chem. Inf. Model..

[11]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[12]  Yanli Wang,et al.  PubChem: Integrated Platform of Small Molecules and Biological Activities , 2008 .

[13]  J. Sutherland,et al.  A comparison of methods for modeling quantitative structure-activity relationships. , 2004, Journal of medicinal chemistry.

[14]  Jun Gao,et al.  Integrated QSAR study for inhibitors of hedgehog signal pathway against multiple cell lines:a collaborative filtering method , 2012, BMC Bioinformatics.

[15]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[16]  Samuel Kaski,et al.  Kernelized Bayesian Matrix Factorization , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Alexander Tropsha,et al.  Novel Variable Selection Quantitative Structure-Property Relationship Approach Based on the k-Nearest-Neighbor Principle , 2000, J. Chem. Inf. Comput. Sci..

[18]  Joanna Matysiak QSAR of Antiproliferative Activity of N‐Substituted 2‐Amino‐5‐(2,4‐dihydroxyphenyl)‐1,3,4‐thiadiazoles in Various Human Cancer Cells , 2008 .

[19]  P. Carrupt,et al.  Molecular fields in quantitative structure–permeation relationships: the VolSurf approach , 2000 .

[20]  Julio Saez-Rodriguez,et al.  Machine Learning Prediction of Cancer Cell Sensitivity to Drugs Based on Genomic and Chemical Properties , 2012, PloS one.

[21]  Pierre Baldi,et al.  Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules , 2013, J. Chem. Inf. Model..

[22]  Mehmet Gönen,et al.  Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization , 2012, Bioinform..

[23]  William L. Jorgensen,et al.  Journal of Chemical Information and Modeling , 2005, J. Chem. Inf. Model..

[24]  Xiang-Qun Xie,et al.  Recent Advances in Fragment-Based QSAR and Multi-Dimensional QSAR Methods , 2010, International journal of molecular sciences.

[25]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[26]  Eduardo A. Castro,et al.  QSAR treatment on a new class of triphenylmethyl-containing compounds as potent anticancer agents , 2011 .

[27]  M. Belvin,et al.  RAF inhibitors prime wild-type RAF to activate the MAPK pathway and enhance growth , 2010, Nature.

[28]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[29]  Giuseppe Musumarra,et al.  A multivariate insight into the in vitro antitumour screen database of the National Cancer Institute: classification of compounds, similarities among cell lines and the influence of molecular targets , 2001, J. Comput. Aided Mol. Des..

[30]  S. Heidorn,et al.  Genomics of Drug Sensitivity in Cancer , 2013, Definitions.

[31]  Ismael Zamora,et al.  Suitability of GRIND-Based Principal Properties for the Description of Molecular Similarity and Ligand-Based Virtual Screening , 2009, J. Chem. Inf. Model..

[32]  Yoshihiro Yamanishi,et al.  Drug Side-Effect Prediction Based on the Integration of Chemical and Biological Spaces , 2012, J. Chem. Inf. Model..

[33]  R. Cramer,et al.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.

[34]  Laura M. Heiser,et al.  A community effort to assess and improve drug sensitivity prediction algorithms , 2014, Nature Biotechnology.

[35]  Joshua M. Stuart,et al.  Subtype and pathway specific responses to anticancer compounds in breast cancer , 2011, Proceedings of the National Academy of Sciences.

[36]  Anton J. Hopfinger,et al.  Application of Genetic Function Approximation to Quantitative Structure-Activity Relationships and Quantitative Structure-Property Relationships , 1994, J. Chem. Inf. Comput. Sci..

[37]  Manuel Pastor,et al.  Development and Validation of AMANDA, a New Algorithm for Selecting Highly Relevant Regions in Molecular Interaction Fields , 2008, J. Chem. Inf. Model..

[38]  Douglas M. Hawkins,et al.  Quantitative Structure–Activity Relationship (QSAR) modeling of juvenile hormone activity: Comparison of validation procedures , 2007 .

[39]  Kimito Funatsu,et al.  Evolution of PLS for Modeling SAR and omics Data , 2012, Molecular informatics.

[40]  Peixun Liu,et al.  Current Mathematical Methods Used in QSAR/QSPR Studies , 2009, International journal of molecular sciences.

[41]  Nematollah Omidikia,et al.  Jackknife-Based Selection of Gram-Schmidt Orthogonalized Descriptors in QSAR , 2010, J. Chem. Inf. Model..