Differential Shannon Entropy as a Sensitive Measure of Differences in Database Variability of Molecular Descriptors

A method termed Differential Shannon Entropy (DSE) is introduced to compare differences in information content and variance of molecular descriptors between compound databases. The analysis is based on histograms recording the individual and grouped distributions of molecular descriptors and calculation of Shannon entropy (SE), a formalism originally applied to digital communication. We have recently shown that SE values reflect the nonparametric variability of descriptor settings. Now the analysis has been advanced to assess differences in information content of 143 molecular descriptors in databases containing synthetic compounds, natural products, or drug-like molecules. The DSE metric captures the degree to which descriptor distributions complement or duplicate information contained in molecular databases. In our analysis, we observe significant differences for a number of descriptors and rank them according to their associated DSE values. Using DSE calculations, relative information content of different types of descriptors can be quantified, even if differences are subtle.

[1]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[2]  Y. Martin,et al.  Computational methods in molecular diversity and combinatorial chemistry. , 1998, Current opinion in chemical biology.

[3]  A. Balaban Highly discriminating distance-based topological index , 1982 .

[4]  Ajay,et al.  Can we learn to distinguish between "drug-like" and "nondrug-like" molecules? , 1998, Journal of medicinal chemistry.

[5]  Robert P. Sheridan,et al.  Chemical Similarity Using Geometric Atom Pair Descriptors , 1996, J. Chem. Inf. Comput. Sci..

[6]  P. Labute A widely applicable set of descriptors. , 2000, Journal of molecular graphics & modelling.

[7]  Malcolm J. McGregor,et al.  Clustering of Large Databases of Compounds: Using the MDL "Keys" as Structural Descriptors , 1997, J. Chem. Inf. Comput. Sci..

[8]  Jürgen Bajorath,et al.  Distinguishing between Natural Products and Synthetic Molecules by Descriptor Shannon Entropy Analysis and Binary QSAR Calculations , 2000, J. Chem. Inf. Comput. Sci..

[9]  Christophe G. Lambert,et al.  Analysis of a Large Structure/Biological Activity Data Set Using Recursive Partitioning , 1999, J. Chem. Inf. Comput. Sci..

[10]  Yvonne C. Martin,et al.  The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor Binding , 1997, J. Chem. Inf. Comput. Sci..

[11]  Gordon M. Crippen,et al.  Prediction of Physicochemical Parameters by Atomic Contributions , 1999, J. Chem. Inf. Comput. Sci..

[12]  Jürgen Bajorath,et al.  Variability of Molecular Descriptors in Compound Databases Revealed by Shannon Entropy Calculations , 2000, J. Chem. Inf. Comput. Sci..

[13]  Hans Matter,et al.  Comparing 3D Pharmacophore Triplets and 2D Fingerprints for Selecting Diverse Compound Subsets , 1999, J. Chem. Inf. Comput. Sci..

[14]  Jürgen Bajorath,et al.  Evaluation of Descriptors and Mini-Fingerprints for the Identification of Molecules with Similar Activity , 2000, J. Chem. Inf. Comput. Sci..

[15]  John Bradshaw,et al.  The Effectiveness of Reactant Pools for Generating Structurally-Diverse Combinatorial Libraries , 1997, J. Chem. Inf. Comput. Sci..

[16]  H. Kubinyi,et al.  A scoring scheme for discriminating between drugs and nondrugs. , 1998, Journal of medicinal chemistry.

[17]  Alexandru T. Balaban,et al.  Chemical graphs , 1979 .

[18]  Jürgen Bajorath,et al.  Database Searching for Compounds with Similar Biological Activity Using Short Binary Bit String Representations of Molecules , 1999, J. Chem. Inf. Comput. Sci..

[19]  H. Matter,et al.  Selecting optimally diverse compounds from structure databases: a validation study of two-dimensional and three-dimensional molecular descriptors. , 1997, Journal of medicinal chemistry.