Mold2, Molecular Descriptors from 2D Structures for Chemoinformatics and Toxicoinformatics

Research applications in chemoinformatics and toxicoinformatics increasingly use representations of molecules in the form of numerical descriptors that capture the structural characteristics and properties of molecules. These representations are useful for ADME/toxicity prediction, diversity analysis, library design, QSAR/QSPR, virtual screening, and other purposes. Molecular descriptors have ranged from relatively simple forms calculated from simple two-dimensional (2D) chemical structures to more complex forms representing three-dimensional (3D) chemical structures or complex molecular fingerprints consisting of numerous bit positions to represent specific chemical information. The Mold (2) software was developed to enable the rapid calculation of a large and diverse set of descriptors encoding two-dimensional chemical structure information. Comparative analysis of Mold (2) descriptors with those calculated by Cerius (2), Dragon, and Molconn-Z on several data sets using Shannon entropy analysis demonstrated that Mold (2) descriptors convey a similar amount of information. In addition, using the same classification method, slightly better models were generated using Mold (2) descriptors compared to those generated using descriptors from the compared commercial software packages. The low computing cost for Mold (2) makes it suitable not only for small data sets, such as in QSAR, but also for large databases in virtual screening. High reproducibility and reliability are expected because Mold (2) does not require 3D structures. Mold (2) is freely available to the public ( http://www.fda.gov/nctr/science/centers/toxicoinformatics/index.htm).

[1]  H Hong,et al.  An in silico ensemble method for lead discovery: decision forest , 2005, SAR and QSAR in environmental research.

[2]  Milan Randic,et al.  Optimal Molecular Descriptors Based on Weighted Path Numbers , 1999, J. Chem. Inf. Comput. Sci..

[3]  L Xue,et al.  Molecular descriptors in chemoinformatics, computational combinatorial chemistry, and virtual screening. , 2000, Combinatorial chemistry & high throughput screening.

[4]  Jürgen Bajorath,et al.  Variability of Molecular Descriptors in Compound Databases Revealed by Shannon Entropy Calculations , 2000, J. Chem. Inf. Comput. Sci..

[5]  Milan Randić,et al.  On the recognition of identical graphs representing molecular topology , 1974 .

[6]  Danail Bonchev,et al.  The microcomputer OASIS system for predicting the biological activity of chemical compounds , 1990, Comput. Chem..

[7]  Jean-Loup Faulon,et al.  The Signature Molecular Descriptor. 1. Using Extended Valence Sequences in QSAR and QSPR Studies , 2003, J. Chem. Inf. Comput. Sci..

[8]  L. Hall,et al.  Molecular Structure Description: The Electrotopological State , 1999 .

[9]  Weida Tong,et al.  Study of 202 natural, synthetic, and environmental chemicals for binding to the androgen receptor. , 2003, Chemical research in toxicology.

[10]  F. Burden A CHEMICALLY INTUITIVE MOLECULAR INDEX BASED ON THE EIGENVALUES OF A MODIFIED ADJACENCY MATRIX , 1997 .

[11]  M. Karelson,et al.  QSPR: the correlation and quantitative prediction of chemical and physical properties from structure , 1995 .

[12]  M. T. B. Geller,et al.  Molecular , 2019, Modern Pathology.

[13]  L B Kier,et al.  Structure-activity studies using valence molecular connectivity. , 1977, Journal of pharmaceutical sciences.

[14]  C. Hansch,et al.  A NEW SUBSTITUENT CONSTANT, PI, DERIVED FROM PARTITION COEFFICIENTS , 1964 .

[15]  Weida Tong,et al.  Multiclass Decision Forest--a novel pattern recognition method for multiclass classification in microarray data analysis. , 2004, DNA and cell biology.

[16]  Yvonne C. Martin,et al.  The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor Binding , 1997, J. Chem. Inf. Comput. Sci..

[17]  J. Zupan,et al.  REPRESENTATION OF MOLECULAR ELECTROSTATIC POTENTIALS BY TOPOLOGICAL FEATURE MAPS , 1994 .

[18]  Allan M. Ferguson,et al.  EVA: A new theoretically based molecular descriptor for use in QSAR/QSPR analysis , 1997, J. Comput. Aided Mol. Des..

[19]  Daniel Cabrol-Bass,et al.  Quasi-orthogonal Basis Sets of Molecular Graph Descriptors as a Chemical Diversity Measure , 2000, J. Chem. Inf. Comput. Sci..

[20]  Y. Martin,et al.  An evaluation of structural descriptors and clustering methods for use in diversity selection. , 1998, SAR and QSAR in environmental research.

[21]  Dejan Plavšić,et al.  Novel graphical matrix and distance-based molecular descriptors , 2004 .

[22]  Xinquan Xin,et al.  ESSESA, an expert system for structure elucidation from spectral analysis , 1992 .

[23]  Weida Tong,et al.  Assessment of Prediction Confidence and Domain Extrapolation of Two Structure–Activity Relationship Models for Predicting Estrogen Receptor Binding Activity , 2004, Environmental health perspectives.

[24]  Huixiao Hong,et al.  ESSESA: An Expert System for Structure Elucidation from Spectra. 4. Canonical Representation of Structures , 1994, J. Chem. Inf. Comput. Sci..

[25]  Hans Matter,et al.  Comparing 3D Pharmacophore Triplets and 2D Fingerprints for Selecting Diverse Compound Subsets , 1999, J. Chem. Inf. Comput. Sci..

[26]  Milan Randić,et al.  Generalized molecular descriptors , 1991 .

[27]  Andrey A. Toropov,et al.  Improved Molecular Descriptors Based on the Optimization of Correlation Weights of Local Graph Invariants , 2001 .

[28]  M. Karelson,et al.  Quantum-Chemical Descriptors in QSAR/QSPR Studies. , 1996, Chemical reviews.

[29]  Malcolm J. McGregor,et al.  Clustering of Large Databases of Compounds: Using the MDL “Keys” as Structural Descriptors , 1997 .

[30]  Weida Tong,et al.  Decision Forest: Combining the Predictions of Multiple Independent Decision Tree Models , 2003, J. Chem. Inf. Comput. Sci..

[31]  R. M. Muir,et al.  Correlation of Biological Activity of Phenoxyacetic Acids with Hammett Substituent Constants and Partition Coefficients , 1962, Nature.

[32]  Alexandru T. Balaban,et al.  Topological indices based on topological distances in molecular graphs , 1983 .

[33]  Brown Rd,et al.  An Evaluation of Structural Descriptors and Clustering Methods for Use in Diversity Selection , 1998 .

[34]  P. Jurs,et al.  Computer-assisted structure-activity studies of chemical carcinogens. A heterogeneous data set. , 1979, Journal of medicinal chemistry.

[35]  Per Sjöberg MOLSURF ‐ a Generator of Chemical Descriptors for QSAR , 2007 .

[36]  Charles L. Wilkins,et al.  Graph theoretical ordering of structures as a basis for systematic searches for regularities in molecular data , 1979 .

[37]  Weida Tong,et al.  Prediction of estrogen receptor binding for 58,000 chemicals using an integrated system of a tree-based model with structural alerts. , 2001, Environmental health perspectives.

[38]  Robert W. Taft,et al.  Polar and Steric Substituent Constants for Aliphatic and o-Benzoate Groups from Rates of Esterification and Hydrolysis of Esters1 , 1952 .

[39]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[40]  W. Meylan,et al.  Atom/fragment contribution method for estimating octanol-water partition coefficients. , 1995, Journal of pharmaceutical sciences.

[41]  Jorge Gálvez,et al.  Charge Indexes. New Topological Descriptors , 1994, J. Chem. Inf. Comput. Sci..

[42]  Alexandru T. Balaban,et al.  Topological and Stereochemical Molecular Descriptors for Databases Useful in QSAR, Similarity/Dissimilarity and Drug Design , 1998 .

[43]  Marina Lasagni,et al.  New molecular descriptors for 2D and 3D structures. Theory , 1994 .

[44]  Malcolm J. McGregor,et al.  Clustering of Large Databases of Compounds: Using the MDL "Keys" as Structural Descriptors , 1997, J. Chem. Inf. Comput. Sci..

[45]  Taravat Ghafourian,et al.  Hydrogen Bonding Parameters for QSAR: Comparison of Indicator Variables, Hydrogen Bond Counts, Molecular Orbital and Other Parameters , 1999, J. Chem. Inf. Comput. Sci..

[46]  Paola Gramatica,et al.  SD-modelling and Prediction by WHIM Descriptors. Part 5. Theory Development and Chemical Meaning of WHIM Descriptors , 1997 .

[47]  Harry P. Schultz,et al.  Topological organic chemistry. 1. Graph theory and topological indices of alkanes , 1989, J. Chem. Inf. Comput. Sci..

[48]  Gerta Rücker,et al.  Counts of all walks as atomic and molecular descriptors , 1993, J. Chem. Inf. Comput. Sci..

[49]  Gerta Rücker,et al.  Mathematical Relation between Extended Connectivity and Eigenvector Coefficients , 1994, J. Chem. Inf. Comput. Sci..

[50]  Subhash C. Basak,et al.  Determining structural similarity of chemicals using graph-theoretic indices , 1988, Discret. Appl. Math..

[51]  Mircea V. Diudea Wiener and Hyper-Wiener Numbers in a Single Matrix , 1996, J. Chem. Inf. Comput. Sci..

[52]  Malcolm J. McGregor,et al.  Pharmacophore Fingerprinting. 1. Application to QSAR and Focused Library Design , 1999, J. Chem. Inf. Comput. Sci..

[53]  Gustavo A. Arteca,et al.  Molecular Shape Descriptors , 2007 .

[54]  David T. Stanton,et al.  Evaluation and Use of BCUT Descriptors in QSAR and QSPR Studies , 1999, J. Chem. Inf. Comput. Sci..

[55]  L. Hall,et al.  The Molecular Connectivity Chi Indexes and Kappa Shape Indexes in Structure‐Property Modeling , 2007 .

[56]  L. Hammett The Effect of Structure upon the Reactions of Organic Compounds. Benzene Derivatives , 1937 .

[57]  O. Ivanciuc,et al.  Matrices and Structural Descriptors Computed from Molecular Graph Distances , 2000 .