Search for Predictive Generic Model of Aqueous Solubility Using Bayesian Neural Nets

Several predictive models of aqueous solubility have been published. They have good performances on the data sets which have been used for training the models, but usually these data sets do not contain many structures similar to the structures of interest to the drug research and their applicability in drug hunting is questionable. A very diverse data set has been gathered with compounds issued from literature reports and proprietary compounds. These compounds have been grouped in a so-called literature data set I, a proprietary data set II, and a mixed data set III formed by I and II. About 100 descriptors emphasizing surface properties were calculated for every compound. Bayesian learning of neural nets which cumulates the advantages of neural nets without having their weaknesses was used to select the most parsimonious models and train them, from I, II, and III. The models were established by either selecting the most efficient descriptors one by one using a modified Gram-Schmidt procedure (GS) or by simplifying a most complete model using automatic relevance procedure (ARD). The predictive ability of the models was accessed using validation data sets as much unrelated to the training sets as possible, using two new parameters: NDD(x,ref) the normalized smallest descriptor distance of a compound x to a reference data set and CD(x,mod) the combination of NDD(x,ref) with the dispersion of the Bayesian neural nets calculations. The results show that it is possible to obtain a generic predictive model from database I but that the diversity of database II is too restricted to give a model with good generalization ability and that the ARD method applied to the mixed database III gives the best predictive model.

[1]  Igor V. Tetko,et al.  Neural network studies, 1. Comparison of overfitting and overtraining , 1995, J. Chem. Inf. Comput. Sci..

[2]  N. Bodor,et al.  A new method for the estimation of the aqueous solubility of organic compounds. , 1992, Journal of pharmaceutical sciences.

[3]  Stephen R. Heller,et al.  Expert Systems for Evaluating Physicochemical Property Values. 1. Aqueous Solubility , 1994, J. Chem. Inf. Comput. Sci..

[4]  Peter C. Jurs,et al.  Automated Descriptor Selection for Quantitative Structure-Activity Relationships Using Generalized Simulated Annealing , 1995, J. Chem. Inf. Comput. Sci..

[5]  Thomas R. Kowar,et al.  Genetic Function Approximation Experimental Design (GFAXD): A New Method for Experimental Design , 1998, J. Chem. Inf. Comput. Sci..

[6]  Alan R. Katritzky,et al.  A New Efficient Approach for Variable Selection Based on Multiregression: Prediction of Gas Chromatographic Retention Times and Response Factors , 1999, J. Chem. Inf. Comput. Sci..

[7]  Oswaldo Araujo,et al.  Properties of New Orthogonal Graph Theoretical Invariants in Structure-Property Correlations , 1998, J. Chem. Inf. Comput. Sci..

[8]  Samuel H. Yalkowsky,et al.  AQUAFAC 3: aqueous functional group activity coefficients; application to the estimation of aqueous solubility , 1995 .

[9]  Samuel H. Yalkowsky,et al.  Aqueous functional group activity coefficients (AQUAFAC) 4: Applications to complex organic compounds , 1996 .

[10]  F. Irmann Eine einfache Korrelation zwischen Wasserlöslichkeit und Struktur von Kohlenwasserstoffen und Halogenkohlenwasserstoffen , 1965 .

[11]  G Schneider,et al.  Artificial neural networks for computer-based molecular design. , 1998, Progress in biophysics and molecular biology.

[12]  F. Burden,et al.  Robust QSAR models using Bayesian regularized neural networks. , 1999, Journal of medicinal chemistry.

[13]  Ralph Kühne,et al.  Group contribution methods to estimate water solubility of organic chemicals , 1995 .

[14]  Gérard Dreyfus,et al.  Toward a Principled Methodology for Neural Network Design and Performance Evaluation in QSAR. Application to the Prediction of LogP , 1998, J. Chem. Inf. Comput. Sci..

[15]  F. Burden,et al.  New QSAR Methods Applied to Structure—Activity Mapping and Combinatorial Chemistry. , 1999 .

[16]  Ajay,et al.  Can we learn to distinguish between "drug-like" and "nondrug-like" molecules? , 1998, Journal of medicinal chemistry.

[17]  D. E. Clark Rapid calculation of polar molecular surface area and its application to the prediction of transport phenomena. 1. Prediction of intestinal absorption. , 1999, Journal of pharmaceutical sciences.

[18]  J Taskinen Prediction of aqueous solubility in drug design. , 2000, Current opinion in drug discovery & development.

[19]  P. Jurs,et al.  Development and use of charged partial surface area structural descriptors in computer-assisted quantitative structure-property relationship studies , 1990 .

[20]  G. Cruciani,et al.  Generating Optimal Linear PLS Estimations (GOLPE): An Advanced Chemometric Tool for Handling 3D‐QSAR Problems , 1993 .

[21]  Jarmo Huuskonen,et al.  Estimation of Aqueous Solubility for a Diverse Set of Organic Compounds Based on Molecular Topology , 2000, J. Chem. Inf. Comput. Sci..

[22]  Yilin Wang,et al.  QSPR Studies on Vapor Pressure, Aqueous Solubility, and the Prediction of Water-Air Partition Coefficients , 1998, J. Chem. Inf. Comput. Sci..

[23]  L. Xue,et al.  Identification of a Preferred Set of Molecular Descriptors for Compound Classification Based on Principal Component Analysis. , 1999 .

[24]  Roald Hoffmann,et al.  Ockham's Razor and Chemistry * , 1997 .

[25]  Takahiro Suzuki,et al.  Development of an automatic estimation system for both the partition coefficient and aqueous solubility , 1991, J. Comput. Aided Mol. Des..

[26]  M. Abraham,et al.  The correlation and prediction of the solubility of compounds in water using an amended solvation energy relationship. , 1999, Journal of pharmaceutical sciences.

[27]  Alan R. Katritzky,et al.  A QSPR Study of the Solubility of Gases and Vapors in Water , 1996, J. Chem. Inf. Comput. Sci..

[28]  Shaomeng Wang,et al.  Estimation of aqueous solubility of organic molecules by the group contribution approach. Application to the study of biodegradation , 1992, J. Chem. Inf. Comput. Sci..

[29]  Peter C. Jurs,et al.  Prediction of Aqueous Solubility of Organic Compounds from Molecular Structure , 1998, J. Chem. Inf. Comput. Sci..

[30]  T. A. Andrea,et al.  Applications of neural networks in quantitative structure-activity relationships of dihydrofolate reductase inhibitors. , 1991, Journal of medicinal chemistry.

[31]  Igor V. Tetko,et al.  Data modelling with neural networks: Advantages and limitations , 1997, J. Comput. Aided Mol. Des..

[32]  S. Yalkowsky,et al.  Solubility and partitioning I: Solubility of nonelectrolytes in water. , 1980, Journal of pharmaceutical sciences.

[33]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[34]  J. Andrew Grant,et al.  A smooth permittivity function for Poisson–Boltzmann solvation methods , 2001, J. Comput. Chem..

[35]  Chris L. Waller,et al.  Development and Validation of a Novel Variable Selection Technique with Application to Multidimensional Quantitative Structure-Activity Relationship Studies , 1999, J. Chem. Inf. Comput. Sci..

[36]  Shuichi Miyamoto,et al.  A method for calculation of the aqueous solubility of organic compounds by using new fragment solubility constants. , 1986 .

[37]  M Karplus,et al.  Evolutionary optimization in quantitative structure-activity relationship: an application of genetic neural networks. , 1996, Journal of medicinal chemistry.

[38]  W. Dunn,et al.  Amino acid side chain descriptors for quantitative structure-activity relationship studies of peptide analogues. , 1995, Journal of medicinal chemistry.

[39]  Frank R. Burden,et al.  Use of Automatic Relevance Determination in QSAR Studies Using Bayesian Neural Networks , 2000, J. Chem. Inf. Comput. Sci..

[40]  Jyrki Taskinen,et al.  Aqueous Solubility Prediction of Drugs Based on Molecular Topology and Neural Network Modeling , 1998, J. Chem. Inf. Comput. Sci..

[41]  Nicholas Bodor,et al.  Neural network studies. 1. Estimation of the aqueous solubility of organic compounds , 1991 .

[42]  S. Yalkowsky,et al.  Predicting the total entropy of melting: application to pharmaceuticals and environmentally relevant compounds. , 1999, Journal of pharmaceutical sciences.