Enhanced QSAR Model Performance by Integrating Structural and Gene Expression Information

Despite decades of intensive research and a number of demonstrable successes, quantitative structure-activity relationship (QSAR) models still fail to yield predictions with reasonable accuracy in some circumstances, especially when the QSAR paradox occurs. In this study, to avoid the QSAR paradox, we proposed a novel integrated approach to improve the model performance through using both structural and biological information from compounds. As a proof-of-concept, the integrated models were built on a toxicological dataset to predict non-genotoxic carcinogenicity of compounds, using not only the conventional molecular descriptors but also expression profiles of significant genes selected from microarray data. For test set data, our results demonstrated that the prediction accuracy of QSAR model was dramatically increased from 0.57 to 0.67 with incorporation of expression data of just one selected signature gene. Our successful integration of biological information into classic QSAR model provided a new insight and methodology for building predictive models especially when QSAR paradox occurred.

[1]  Paola Gramatica,et al.  Principles of QSAR models validation: internal and external , 2007 .

[2]  M. Fielden,et al.  Development of a large-scale chemogenomics database to improve drug candidate selection and to understand mechanisms of chemical toxicity and action. , 2005, Journal of biotechnology.

[3]  G. Woods,et al.  The two faces of metallothionein in carcinogenesis: photoprotection against UVR-induced cancer and promotion of tumour survival , 2010, Photochemical & photobiological sciences : Official journal of the European Photochemistry Association and the European Society for Photobiology.

[4]  R. Czerminski,et al.  Use of Support Vector Machine in Pattern Classification: Application to QSAR Studies , 2001 .

[5]  A. Bittner,et al.  Predictive toxicogenomics approaches reveal underlying molecular mechanisms of nongenotoxic carcinogenicity , 2006, Molecular carcinogenesis.

[6]  Xiaohui Fan,et al.  Why QSAR fails: an empirical evaluation using conventional computational approach. , 2011, Molecular pharmaceutics.

[7]  David Warde-Farley,et al.  Dynamic modularity in protein interaction networks predicts breast cancer outcome , 2009, Nature Biotechnology.

[8]  Richard Weber,et al.  A wrapper method for feature selection using Support Vector Machines , 2009, Inf. Sci..

[9]  Markus A Lill,et al.  Multi-dimensional QSAR in drug discovery. , 2007, Drug discovery today.

[10]  Xiaohui Fan,et al.  Reliably assessing prediction reliability for high dimensional QSAR data , 2012, Molecular Diversity.

[11]  Robert Stanforth,et al.  The quality of QSAR models: problems and solutions , 2007, SAR and QSAR in environmental research.

[12]  George Loizou,et al.  Development of good modelling practice for physiologically based pharmacokinetic models for use in risk assessment: the first steps. , 2008, Regulatory toxicology and pharmacology : RTP.

[13]  Weida Tong,et al.  Assessment of Prediction Confidence and Domain Extrapolation of Two Structure–Activity Relationship Models for Predicting Estrogen Receptor Binding Activity , 2004, Environmental health perspectives.

[14]  A. Hopkins Network pharmacology: the next paradigm in drug discovery. , 2008, Nature chemical biology.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Weida Tong,et al.  DNA Microarrays Are Predictive of Cancer Prognosis: A Re-evaluation , 2010, Clinical Cancer Research.

[17]  F. Burden,et al.  Robust QSAR models using Bayesian regularized neural networks. , 1999, Journal of medicinal chemistry.

[18]  Richard M. Simon,et al.  A Paradigm for Class Prediction Using Gene Expression Profiles , 2003, J. Comput. Biol..

[19]  A. Barabasi,et al.  Drug—target network , 2007, Nature Biotechnology.

[20]  Y. Liu,et al.  Protective effect of metallothionein against the toxicity of cadmium and other metals(1). , 2001, Toxicology.

[21]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[22]  M. Cronin,et al.  Pitfalls in QSAR , 2003 .

[23]  Guillermín Agüero-Chapín,et al.  QSAR for RNases and theoretic–experimental study of molecular diversity on peptide mass fingerprints of a new Leishmania infantum protein , 2009, Molecular Diversity.

[24]  A. Jayasurya,et al.  Metallothioneins in human tumors and potential roles in carcinogenesis. , 2003, Mutation research.

[25]  J C Madden,et al.  An evaluation of global QSAR models for the prediction of the toxicity of phenols to Tetrahymena pyriformis. , 2008, Chemosphere.

[26]  Gergana Dimitrova,et al.  A Stepwise Approach for Defining the Applicability Domain of SAR and QSAR Models , 2005, J. Chem. Inf. Model..

[27]  Ting Wang,et al.  Boosting: An Ensemble Learning Tool for Compound Classification and QSAR Modeling , 2005, J. Chem. Inf. Model..

[28]  H. Yamada,et al.  The Japanese toxicogenomics project: application of toxicogenomics. , 2010, Molecular nutrition & food research.

[29]  Vladimir Frecer,et al.  Design, structure-based focusing and in silico screening of combinatorial library of peptidomimetic inhibitors of Dengue virus NS2B-NS3 protease , 2010, J. Comput. Aided Mol. Des..

[30]  Weida Tong,et al.  Does Applicability Domain Exist in Microarray-Based Genomic Research? , 2010, PloS one.

[31]  Manuela Pavan,et al.  DRAGON SOFTWARE: AN EASY APPROACH TO MOLECULAR DESCRIPTOR CALCULATIONS , 2006 .

[32]  David A Winkler,et al.  Predictive Bayesian neural network models of MHC class II peptide binding. , 2005, Journal of molecular graphics & modelling.

[33]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[34]  Y. Martin,et al.  Do structurally similar molecules have similar biological activity? , 2002, Journal of medicinal chemistry.

[35]  Jie Liu,et al.  Metallothionein protection of cadmium toxicity. , 2009, Toxicology and applied pharmacology.

[36]  H. Mewes,et al.  Can we estimate the accuracy of ADME-Tox predictions? , 2006, Drug discovery today.

[37]  Debadutta Mishra,et al.  Elemental alteration, iron overloading and metallothionein induction in experimental hepatocarcinogenesis: a free radical-mediated process? , 2011, Toxicology letters.

[38]  Jie Liu,et al.  Metallothionein-I/II Double Knockout Mice Are Hypersensitive to Lead-Induced Kidney Carcinogenesis , 2004, Cancer Research.

[39]  Igor V. Tetko,et al.  Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set , 2010, J. Chem. Inf. Model..

[40]  S. Bradbury,et al.  Quantitative structure-activity relationships and ecological risk assessment: an overview of predictive aquatic toxicology research. , 1995, Toxicology letters.

[41]  Vojtech Adam,et al.  Metallothioneins and zinc in cancer diagnosis and therapy , 2012, Drug metabolism reviews.

[42]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[43]  Li Shao,et al.  Consensus Ranking Approach to Understanding the Underlying Mechanism With QSAR , 2010, J. Chem. Inf. Model..

[44]  Andrew G Mercader,et al.  QSAR study of flavonoids and biflavonoids as influenza H1N1 virus neuraminidase inhibitors. , 2010, European journal of medicinal chemistry.

[45]  C. Cobbett,et al.  Phytochelatins and metallothioneins: roles in heavy metal detoxification and homeostasis. , 2002, Annual review of plant biology.

[46]  Yi Liu,et al.  FS_SFS: A novel feature selection method for support vector machines , 2006, Pattern Recognit..

[47]  Masayoshi Abe,et al.  Carcinogenic risk of copper gluconate evaluated by a rat medium-term liver carcinogenicity bioassay protocol , 2008, Archives of Toxicology.

[48]  George Kollias,et al.  A combined LS-SVM & MLR QSAR workflow for predicting the inhibition of CXCR3 receptor by quinazolinone analogs , 2010, Molecular Diversity.

[49]  Gerald M. Maggiora,et al.  On Outliers and Activity Cliffs-Why QSAR Often Disappoints , 2006, J. Chem. Inf. Model..

[50]  Arthur M. Doweyko,et al.  QSAR: dead or alive? , 2008, J. Comput. Aided Mol. Des..

[51]  Michael C. Rosenstein,et al.  The Comparative Toxicogenomics Database (CTD): a resource for comparative toxicological studies. , 2006, Journal of experimental zoology. Part A, Comparative experimental biology.