Autocorrelation descriptor improvements for QSAR: 2DA_Sign and 3DA_Sign

Quantitative structure–activity relationship (QSAR) is a branch of computer aided drug discovery that relates chemical structures to biological activity. Two well established and related QSAR descriptors are two- and three-dimensional autocorrelation (2DA and 3DA). These descriptors encode the relative position of atoms or atom properties by calculating the separation between atom pairs in terms of number of bonds (2DA) or Euclidean distance (3DA). The sums of all values computed for a given small molecule are collected in a histogram. Atom properties can be added with a coefficient that is the product of atom properties for each pair. This procedure can lead to information loss when signed atom properties are considered such as partial charge. For example, the product of two positive charges is indistinguishable from the product of two equivalent negative charges. In this paper, we present variations of 2DA and 3DA called 2DA_Sign and 3DA_Sign that avoid information loss by splitting unique sign pairs into individual histograms. We evaluate these variations with models trained on nine datasets spanning a range of drug target classes. Both 2DA_Sign and 3DA_Sign significantly increase model performance across all datasets when compared with traditional 2DA and 3DA. Lastly, we find that limiting 3DA_Sign to maximum atom pair distances of 6 Å instead of 12 Å further increases model performance, suggesting that conformational flexibility may hinder performance with longer 3DA descriptors. Consistent with this finding, limiting the number of bonds in 2DA_Sign from 11 to 5 fails to improve performance.

[1]  David J. Livingstone,et al.  The Use of Artificial Neural Networks in QSAR , 1992 .

[2]  Ajay N. Jain,et al.  Robust ligand-based modeling of the biological targets of known drugs. , 2006, Journal of medicinal chemistry.

[3]  Márcia M. C. Ferreira,et al.  Basic validation procedures for regression models in QSAR and QSPR studies: theory and application , 2009 .

[4]  Edward W. Lowe,et al.  Computational Methods in Drug Discovery , 2014, Pharmacological Reviews.

[5]  P. Broto,et al.  Molecular structures: perception, autocorrelation descriptor and sar studies. Perception of molecules: topological structure and 3-dimensional structure , 1984 .

[6]  M. Shahlaei Descriptor selection methods in quantitative structure-activity relationship studies: a review study. , 2013, Chemical reviews.

[7]  Andrew Streitwieser,et al.  Molecular orbital theory for organic chemists , 1961 .

[8]  Robert D. Clark,et al.  Managing bias in ROC curves , 2008, J. Comput. Aided Mol. Des..

[9]  H. Kubinyi,et al.  3D QSAR in drug design. , 2002 .

[10]  Michael K. Gilson,et al.  Fast Assignment of Accurate Partial Atomic Charges: An Electronegativity Equalization Method that Accounts for Alternate Resonance Forms , 2003, J. Chem. Inf. Comput. Sci..

[11]  Ernesto Estrada,et al.  Chemical Graph Theory , 2013 .

[12]  P. Broto,et al.  Molecular structures: perception, autocorrelation descriptor and sar studies: system of atomic contributions for the calculation of the n-octanol/water partition coefficients , 1984 .

[13]  Johann Gasteiger,et al.  A new model for calculating atomic charges in molecules , 1978 .

[14]  Tudor I. Oprea,et al.  Virtual screening applications: a study of ligand-based methods and different structure representations in four different scenarios , 2007, J. Comput. Aided Mol. Des..

[15]  Alexandru T. Balaban,et al.  Topological and Stereochemical Molecular Descriptors for Databases Useful in QSAR, Similarity/Dissimilarity and Drug Design , 1998 .

[16]  Johann Gasteiger,et al.  Overcoming the Limitations of a Connection Table Description: A Universal Representation of Chemical Species , 1997, J. Chem. Inf. Comput. Sci..

[17]  Johann Gasteiger,et al.  QUANTITATIVE MODELS OF GAS-PHASE PROTON-TRANSFER REACTIONS INVOLVING ALCOHOLS, ETHERS, AND THEIR THIO ANALOGS. CORRELATION ANALYSES BASED ON RESIDUAL ELECTRONEGATIVITY AND EFFECTIVE POLARIZABILITY , 1984 .

[18]  Kenneth J. Miller,et al.  Additivity methods in molecular polarizability , 1990 .

[19]  A. Hopfinger,et al.  Construction of 3D-QSAR Models Using the 4D-QSAR Analysis Formalism , 1997 .

[20]  Johann Gasteiger,et al.  Deriving the 3D structure of organic molecules from their infrared spectra , 1999 .

[21]  J. Gasteiger,et al.  Calculation of the Charge Distribution in Conjugated Systems by a Quantification of the Resonance Concept , 1985 .

[22]  Jens Meiler,et al.  Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database , 2013, Molecules.

[23]  Johann Gasteiger,et al.  New empirical models of substituent polarisability and their application to stabilisation effects in positively charged species , 1983 .

[24]  J. Gasteiger,et al.  ITERATIVE PARTIAL EQUALIZATION OF ORBITAL ELECTRONEGATIVITY – A RAPID ACCESS TO ATOMIC CHARGES , 1980 .

[25]  John Manchester,et al.  CAUTION: Popular "Benchmark" Data Sets Do Not Distinguish the Merits of 3D QSAR Methods , 2009, J. Chem. Inf. Model..

[26]  J. Gasteiger,et al.  FROM ATOMS AND BONDS TO THREE-DIMENSIONAL ATOMIC COORDINATES : AUTOMATIC MODEL BUILDERS , 1993 .

[27]  Johann Gasteiger,et al.  Extension of the method of iterative partial equalization of orbital electronegativity to small ring systems , 1983 .

[28]  Jens Meiler,et al.  Bcl∷ChemInfo - Qualitative analysis of machine learning models for activation of HSD involved in Alzheimer's Disease , 2012, 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).