Multi-Assay-Based Structure-Activity Relationship Models: Improving Structure-Activity Relationship Models by Incorporating Activity Information from Related Targets

Structure-activity relationship (SAR) models are used to inform and to guide the iterative optimization of chemical leads, and they play a fundamental role in modern drug discovery. In this paper, we present a new class of methods for building SAR models, referred to as multi-assay based, that utilize activity information from different targets. These methods first identify a set of targets that are related to the target under consideration, and then they employ various machine learning techniques that utilize activity information from these targets in order to build the desired SAR model. We developed different methods for identifying the set of related targets, which take into account the primary sequence of the targets or the structure of their ligands, and we also developed different machine learning techniques that were derived by using principles of semi-supervised learning, multi-task learning, and classifier ensembles. The comprehensive evaluation of these methods shows that they lead to considerable improvements over the standard SAR models that are based only on the ligands of the target under consideration. On a set of 117 protein targets, obtained from PubChem, these multi-assay-based methods achieve a receiver-operating characteristic score that is, on the average, 7.0 -7.2% higher than that achieved by the standard SAR models. Moreover, on a set of targets belonging to six protein families, the multi-assay-based methods outperform chemogenomics-based approaches by 4.33%.

[1]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[2]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[3]  Kevin Chen-Chuan Chang,et al.  PEBL: positive example based learning for Web page classification using SVM , 2002, KDD.

[4]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[5]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[6]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[7]  Matthieu Hamel,et al.  Journal of Medicinal Chemistry , 2010 .

[8]  Odile Papini,et al.  Information Fusion , 2014, Computer Vision, A Reference Guide.

[9]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[10]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[11]  R. Williams,et al.  Journal of American Chemical Society , 1979 .

[12]  George Karypis,et al.  fRMSDAlign: Protein Sequence Alignment Using Predicted Local Structure Information for Pairs with Low Sequence Identity , 2007, APBC.

[13]  A. W.,et al.  Journal of chemical information and computer sciences. , 1995, Environmental science & technology.

[14]  G A Petsko,et al.  Chemistry and biology. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[15]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[16]  Kim D. Janda,et al.  Molecular diversity and combinatorial chemistry : libraries and drug discovery , 1996 .

[17]  Darren V. S. Green,et al.  Modelling Structure‐Activity Relationships , 2000 .

[18]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[19]  宁北芳,et al.  疟原虫var基因转换速率变化导致抗原变异[英]/Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A , 2005 .

[20]  William L. Jorgensen,et al.  Journal of Chemical Information and Modeling , 2005, J. Chem. Inf. Model..

[21]  Edwin V. Bonilla,et al.  Kernel Multi-task Learning using Task-specific Features , 2007, AISTATS.

[22]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[23]  Peter Tino,et al.  IEEE Transactions on Neural Networks , 2009 .

[24]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[25]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[27]  Gunnar Rätsch,et al.  A General and Efficient Multiple Kernel Learning Algorithm , 2005, NIPS.

[28]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[29]  Meila,et al.  Kernel multitask learning using task-specific features , 2007 .

[30]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[31]  Journal of Molecular Biology , 1959, Nature.

[32]  BMC Bioinformatics , 2005 .