QSAR Modeling Based on Conformation Ensembles Using a Multi-Instance Learning Approach

Modern QSAR approaches have wide practical applications in drug discovery for designing potentially bioactive molecules. If such models are based on the use of 2D descriptors, important information contained in the spatial structures of molecules is lost. The major problem in constructing models using 3D descriptors is the choice of a putative bioactive conformation, which affects the predictive performance. The multi-instance (MI) learning approach considering multiple conformations in model training could be a reasonable solution to the above problem. In this study, we implemented several multi-instance algorithms, both conventional and based on deep learning, and investigated their performance. We compared the performance of MI-QSAR models with those based on the classical single-instance QSAR (SI-QSAR) approach in which each molecule is encoded by either 2D descriptors computed for the corresponding molecular graph or 3D descriptors issued for a single lowest energy conformation. The calculations were carried out on 175 data sets extracted from the ChEMBL23 database. It is demonstrated that (i) MI-QSAR outperforms SI-QSAR in numerous cases and (ii) MI algorithms can automatically identify plausible bioactive conformations.

[1]  Pavel G. Polischuk,et al.  Hierarchic system of QSAR models (1D–4D) on the base of simplex representation of molecular structure , 2005, Journal of molecular modeling.

[2]  Max Dobler,et al.  Multi-dimensional QSAR in drug research , 2000 .

[3]  Wenyu Liu,et al.  Revisiting multiple instance neural networks , 2016, Pattern Recognit..

[4]  M. Pastor,et al.  A strategy for the incorporation of water molecules present in a ligand binding site into a three-dimensional quantitative structure--activity relationship analysis. , 1997, Journal of medicinal chemistry.

[5]  Eric Granger,et al.  Multiple instance learning: A survey of problem characteristics and applications , 2016, Pattern Recognit..

[6]  Matheus P. Freitas,et al.  Different approaches to encode and model 3D information in a MIA-QSAR perspective , 2021 .

[7]  R. Cramer,et al.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.

[8]  Multi-Instance Learning Approach to Predictive Modeling of Catalysts Enantioselectivity , 2021, Synlett.

[9]  Kristin P. Bennett,et al.  Multiple instance ranking , 2008, ICML '08.

[10]  Pavel Polishchuk,et al.  Virtual Screening Using Pharmacophore Models Retrieved from Molecular Dynamic Simulations , 2019, International journal of molecular sciences.

[11]  Timur I Madzhidov,et al.  Probabilistic Approach for Virtual Screening Based on Multiple Pharmacophores , 2020, Molecules.

[12]  Pavel Polishchuk,et al.  Ligand-Based Pharmacophore Modeling Using Novel 3D Pharmacophore Signatures , 2018, Molecules.

[13]  Charlotte M. Deane,et al.  Freely Available Conformer Generation Methods: How Good Are They? , 2012, J. Chem. Inf. Model..

[14]  Denis Fourches,et al.  Benchmarking 2D/3D/MD-QSAR Models for Imatinib Derivatives: How Far Can We Predict? , 2020, J. Chem. Inf. Model..

[15]  R. Todeschini,et al.  The WHIM Theory: New 3D-molecular descriptors for QSAR in environmental modelling , 1997 .

[16]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[17]  A. Hopfinger A QSAR investigation of dihydrofolate reductase inhibition by Baker triazines based upon molecular shape analysis , 1980 .

[18]  B D Silverman,et al.  Comparative molecular moment analysis (CoMMA): 3D-QSAR without molecular superposition. , 1996, Journal of medicinal chemistry.

[19]  Markus A Lill,et al.  Multi-dimensional QSAR in drug discovery. , 2007, Drug discovery today.

[20]  Josef Scheiber,et al.  xMaP - An Interpretable Alignment-Free Four-Dimensional Quantitative Structure-Activity Relationship Technique Based on Molecular Surface Properties and Conformer Ensembles , 2018, J. Chem. Inf. Model..

[21]  Sereina Riniker,et al.  Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation , 2015, J. Chem. Inf. Model..

[22]  I. Baskin,et al.  Multiple Conformer Descriptors for QSAR Modeling , 2021, Molecular informatics.

[23]  Germain Forestier,et al.  Deep learning for time series classification: a review , 2018, Data Mining and Knowledge Discovery.

[24]  G. Klebe,et al.  Molecular similarity indices in a comparative analysis (CoMSIA) of drug molecules to correlate and predict their biological activity. , 1994, Journal of medicinal chemistry.

[25]  Johann Gasteiger,et al.  The Coding of the Three-Dimensional Structure of Molecules by Molecular Transforms and Its Application to Structure-Spectra Correlations and Studies of Biological Activity , 1996, J. Chem. Inf. Comput. Sci..

[26]  Bernd Wendt,et al.  Challenging the gold standard for 3D-QSAR: template CoMFA versus X-ray alignment , 2014, Journal of Computer-Aided Molecular Design.

[27]  A. Hopfinger,et al.  Construction of 3D-QSAR Models Using the 4D-QSAR Analysis Formalism , 1997 .

[28]  Jaume Amores,et al.  Multiple instance classification: Review, taxonomy and comparative study , 2013, Artif. Intell..