Design and Evaluation of a Molecular Fingerprint Involving the Transformation of Property Descriptor Values into a Binary Classification Scheme

A new fingerprint design concept is introduced that transforms molecular property descriptors into two-state descriptors and thus permits binary encoding. This transformation is based on the calculation of statistical medians of descriptor distributions in large compound collections and alleviates the need for value range encoding of these descriptors. For binary encoded property descriptors, bit positions that are set off capture as much information as bit positions that are set on, different from conventional fingerprint representations. Accordingly, a variant of the Tanimoto coefficient has been defined for comparison of these fingerprints. Following our design idea, a prototypic fingerprint termed MP-MFP was implemented by combining 61 binary encoded property descriptors with 110 structural fragment-type descriptors. The performance of this fingerprint was evaluated in systematic similarity search calculations in a database containing 549 molecules belonging to 38 different activity classes and 5000 background molecules. In these calculations, MP-MFP correctly recognized approximately 34% of all similarity relationships, with only 0.04% false positives, and performed better than previous designs and MACCS keys. The results suggest that combinations of simplified two-state property descriptors have predictive value in the analysis of molecular similarity.

[1]  Jürgen Bajorath,et al.  Recursive Median Partitioning for Virtual Screening of Large Databases , 2003, J. Chem. Inf. Comput. Sci..

[2]  Jürgen Bajorath,et al.  Molecular Descriptors for Effective Classification of Biologically Active Compounds Based on Principal Component Analysis Identified by a Genetic Algorithm , 2000, J. Chem. Inf. Comput. Sci..

[3]  L. Xue,et al.  Identification of a Preferred Set of Molecular Descriptors for Compound Classification Based on Principal Component Analysis. , 1999 .

[4]  Hans Matter,et al.  Comparing 3D Pharmacophore Triplets and 2D Fingerprints for Selecting Diverse Compound Subsets , 1999, J. Chem. Inf. Comput. Sci..

[5]  Jürgen Bajorath,et al.  Evaluation of Descriptors and Mini-Fingerprints for the Identification of Molecules with Similar Activity , 2000, J. Chem. Inf. Comput. Sci..

[6]  P. Beroza,et al.  A rapid computational method for lead evolution: description and application to alpha(1)-adrenergic antagonists. , 2000, Journal of medicinal chemistry.

[7]  Michel Petitjean,et al.  Applications of the radius-diameter diagram to the classification of topological and geometrical shapes of chemical compounds , 1992, J. Chem. Inf. Comput. Sci..

[8]  Ling Xue,et al.  Mini-Fingerprints Detect Similar Activity of Receptor Ligands Previously Recognized Only by Three-Dimensional Pharmacophore-Based Methods. , 2001 .

[9]  Malcolm J. McGregor,et al.  Pharmacophore Fingerprinting. 1. Application to QSAR and Focused Library Design , 1999, J. Chem. Inf. Comput. Sci..

[10]  Jürgen Bajorath,et al.  Methods for compound selection focused on hits and application in drug discovery. , 2002, Journal of molecular graphics & modelling.

[11]  J. Bajorath,et al.  Database Searching for Compounds with Similar Biological Activity Using Short Binary Bit String Representations of Molecules. , 1999 .

[12]  Yvonne C. Martin,et al.  The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor Binding , 1997, J. Chem. Inf. Comput. Sci..

[13]  Darren R. Flower,et al.  On the Properties of Bit String-Based Measures of Chemical Similarity , 1998, J. Chem. Inf. Comput. Sci..

[14]  J. Gasteiger,et al.  ITERATIVE PARTIAL EQUALIZATION OF ORBITAL ELECTRONEGATIVITY – A RAPID ACCESS TO ATOMIC CHARGES , 1980 .

[15]  G. Crippen,et al.  Prediction of Physicochemical Parameters by Atomic Contributions. , 1999 .

[16]  Jürgen Bajorath,et al.  Combinatorial Preferences Affect Molecular Similarity/Diversity Calculations Using Binary Fingerprints and Tanimoto Coefficients , 2000, J. Chem. Inf. Comput. Sci..

[17]  Jürgen Bajorath,et al.  Median Partitioning: A Novel Method for the Selection of Representative Subsets from Large Compound Pools. , 2002 .

[18]  J. Mason,et al.  New 4-point pharmacophore method for molecular similarity and diversity applications: overview of the method and applications, including a novel approach to the design of combinatorial libraries containing privileged substructures. , 1999, Journal of medicinal chemistry.

[19]  P. Labute A widely applicable set of descriptors. , 2000, Journal of molecular graphics & modelling.

[20]  Malcolm J. McGregor,et al.  Clustering of Large Databases of Compounds: Using the MDL "Keys" as Structural Descriptors , 1997, J. Chem. Inf. Comput. Sci..

[21]  Jürgen Bajorath,et al.  Chemical Descriptors with Distinct Levels of Information Content and Varying Sensitivity to Differences between Selected Compound Databases Identified by SE-DSE Analysis , 2002, J. Chem. Inf. Comput. Sci..

[22]  Jürgen Bajorath,et al.  Variability of Molecular Descriptors in Compound Databases Revealed by Shannon Entropy Calculations , 2000, J. Chem. Inf. Comput. Sci..

[23]  J S Mason,et al.  Library design and virtual screening using multiple 4-point pharmacophore fingerprints. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[24]  Subhash C. Basak,et al.  Topological Indices: Their Nature and Mutual Relatedness , 2000, J. Chem. Inf. Comput. Sci..

[25]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[26]  J. Bajorath Selected Concepts and Investigations in Compound Classification, Molecular Descriptor Analysis, and Virtual Screening , 2001 .

[27]  J. Bajorath,et al.  Mini-fingerprints for virtual screening: Design principles and generation of novel prototypes based on information theory , 2003, SAR and QSAR in environmental research.

[28]  Jürgen Bajorath,et al.  Accurate Partitioning of Compounds Belonging to Diverse Activity Classes , 2002, J. Chem. Inf. Comput. Sci..

[29]  P Willett,et al.  Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. , 2002, Combinatorial chemistry & high throughput screening.