General Melting Point Prediction Based on a Diverse Compound Data Set and Artificial Neural Networks

We report the development of a robust and general model for the prediction of melting points. It is based on a diverse data set of 4173 compounds and employs a large number of 2D and 3D descriptors to capture molecular physicochemical and other graph-based properties. Dimensionality reduction is performed by principal component analysis, while a fully connected feed-forward back-propagation artificial neural network is employed for model generation. The melting point is a fundamental physicochemical property of a molecule that is controlled by both single-molecule properties and intermolecular interactions due to packing in the solid state. Thus, it is difficult to predict, and previously only melting point models for clearly defined and smaller compound sets have been developed. Here we derive the first general model that covers a comparatively large and relevant part of organic chemical space. The final model is based on 2D descriptors, which are found to contain more relevant information than the 3D descriptors calculated. Internal random validation of the model achieves a correlation coefficient of R(2) = 0.661 with an average absolute error of 37.6 degrees C. The model is internally consistent with a correlation coefficient of the test set of Q(2) = 0.658 (average absolute error 38.2 degrees C) and a correlation coefficient of the internal validation set of Q(2) = 0.645 (average absolute error 39.8 degrees C). Additional validation was performed on an external drug data set consisting of 277 compounds. On this external data set a correlation coefficient of Q(2) = 0.662 (average absolute error 32.6 degrees C) was achieved, showing ability of the model to generalize. Compared to an earlier model for the prediction of melting points of druglike compounds our model exhibits slightly improved performance, despite the much larger chemical space covered. The remaining model error is due to molecular properties that are not captured using single-molecule based descriptors, namely both inter- and intramolecular interactions and crystal packing, for which examples of and reasons for outliers are given.

[1]  Jeff Morris,et al.  Further Development of Reduced Graphs for Identifying Bioactive Compounds , 2003, J. Chem. Inf. Comput. Sci..

[2]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[3]  T. Halgren Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94 , 1996, J. Comput. Chem..

[4]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[5]  Earl Glen Whitehead,et al.  Melting-Point Models of Alkanes , 2004 .

[6]  José Elguero,et al.  Structures of NH-pyrazoles bearing only C-methyl substituents: 4-methylpyrazole is a hydrogen-bonded trimer in the solid (100 K) , 1999 .

[7]  James S. Chickos,et al.  The Estimation of Melting Points and Fusion Enthalpies Using Experimental Solubilities, Estimated Total Phase Change Entropies, and Mobile Order and Disorder Theory , 2002, J. Chem. Inf. Comput. Sci..

[8]  Eamonn F. Healy,et al.  Development and use of quantum mechanical molecular models. 76. AM1: a new general purpose quantum mechanical molecular model , 1985 .

[9]  Ralph G. Pearson,et al.  HARD AND SOFT ACIDS AND BASES , 1963 .

[10]  S. Yalkowsky,et al.  UPPER III: unified physical property estimation relationships. Application to non-hydrogen bonding aromatic compounds. , 1999, Journal of pharmaceutical sciences.

[11]  Ulf Norinder,et al.  Molecular Descriptors Influencing Melting Point and Their Role in Classification of Solid Drugs , 2003, J. Chem. Inf. Comput. Sci..

[12]  Alan R. Katritzky,et al.  Perspective on the Relationship between Melting Points and Chemical Structure , 2001 .

[13]  Huafeng Xu,et al.  A self-organizing principle for learning nonlinear manifolds , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Marvin Charton The nature of topological parameters. I. Are topological parameters `fundamental properties'? , 2003, Journal of computer-aided molecular design.

[15]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[16]  Yvonne C. Martin,et al.  The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor Binding , 1997, J. Chem. Inf. Comput. Sci..

[17]  A. Tropsha,et al.  Beware of q 2 , 2002 .

[18]  Subhash C. Basak,et al.  Use of Topostructural, Topochemical, and Geometric Parameters in the Prediction of Vapor Pressure: A Hierarchical QSAR Approach , 1997, J. Chem. Inf. Comput. Sci..

[19]  Gordon M. Crippen,et al.  Prediction of Physicochemical Parameters by Atomic Contributions , 1999, J. Chem. Inf. Comput. Sci..

[20]  Samuel H. Yalkowsky,et al.  Melting Point, Boiling Point, and Symmetry , 1990, Pharmaceutical Research.

[21]  John M. Barnard,et al.  Clustering of chemical structures on the basis of two-dimensional similarity measures , 1992, J. Chem. Inf. Comput. Sci..

[22]  Hiromi Miyajima,et al.  Predicting melting temperature (Tm) of oligoribonucleotide duplex by neural network , 2002 .

[23]  W. F. Reehl,et al.  Handbook of Chemical Property Estimation Methods: Environmental Behavior of Organic Compounds , 1982 .

[24]  J. B. Austin A RELATION BETWEEN THE MOLECULAR WEIGHTS AND MELTING POINTS OF ORGANIC COMPOUNDS , 1930 .

[25]  Thomas A. Halgren Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94 , 1996, J. Comput. Chem..

[26]  J. Dearden,et al.  The QSAR prediction of melting point, a property of environmental relevance. , 1991, The Science of the total environment.

[27]  Boris Johnson-Restrepo,et al.  Molecular Parameters Responsible for the Melting Point of 1, 2, 3-Diazaborine Compounds , 2003, J. Chem. Inf. Comput. Sci..

[28]  Ritu Jain,et al.  QSPR Correlation of the Melting Point for Pyridinium Bromides, Potential Ionic Liquids , 2002, J. Chem. Inf. Comput. Sci..

[29]  Luwei Zhao,et al.  A Combined Group Contribution and Molecular Geometry Approach for Predicting Melting Points of Aliphatic Compounds , 1999 .

[30]  Xu Wen,et al.  Group Vector Space (GVS) Method for Estimating Boiling and Melting Points of Hydrocarbons , 2002 .

[31]  Alan R. Katritzky,et al.  Prediction of Melting Points for the Substituted Benzenes: A QSPR Approach , 1997, J. Chem. Inf. Comput. Sci..

[32]  Konstantin V. Balakin,et al.  Structure-Based versus Property-Based Approaches in the Design of G-Protein-Coupled Receptor-Targeted Libraries , 2003, J. Chem. Inf. Comput. Sci..

[33]  Ralf Steinsträsser,et al.  Chemistry and Applications of Liquid Crystals , 1973 .

[34]  Gordon J. Kearley,et al.  Hydrogen-bonding in the self-organising system 3,5-dimethylpyrazole , 2001 .

[35]  A. Kálmán,et al.  Structure of 2-benzoylimino-3-methyl-1,3-thiazolidine: a comparison of intramolecular X–S⋯O = Y interactions , 1980 .

[36]  M. Karelson,et al.  QSPR: the correlation and quantitative prediction of chemical and physical properties from structure , 1995 .

[37]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[38]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[39]  Ritu Jain,et al.  Correlation of the Melting Points of Potential Ionic Liquids (Imidazolium Bromides and Benzimidazolium Bromides) Using the CODESSA Program , 2002, J. Chem. Inf. Comput. Sci..