ISIDA Property‐Labelled Fragment Descriptors

ISIDA Property‐Labelled Fragment Descriptors (IPLF) were introduced as a general framework to numerically encode molecular structures in chemoinformatics, as counts of specific subgraphs in which atom vertices are coloured with respect to some local property/feature. Combining various colouring strategies of the molecular graph – notably pH‐dependent pharmacophore and electrostatic potential‐based flagging – with several fragmentation schemes, the different subtypes of IPLFs may range from classical atom pair and sequence counts, to monitoring population levels of branched fragments or feature multiplets. The pH‐dependent feature flagging, pursued at the level of each significantly populated microspecies involved in the proteolytic equilibrium, may furthermore add some competitive advantage over classical descriptors, even when the chosen fragmentation scheme is one of the state‐of‐the‐art pattern extraction procedures (feature sequence or pair counts, etc.) in chemoinformatics. The implemented fragmentation schemes support counting (1) linear feature sequences, (2) feature pairs, (3) circular feature fragments a.k.a. “augmented atoms” or (4) feature trees. Fuzzy rendering – optionally allowing nonterminal fragment atoms to be counted as wildcards, ignoring their specific colours/features – ensures for a seamless transition between the “strict” counts (sequences or circular fragments) and the “fuzzy” multiplet counts (pairs or trees). Also, bond information may be represented or ignored, thus leaving the user a vast choice in terms of the level of resolution at which chemical information should be extracted into the descriptors. Selected IPLF subsets were – tree descriptors, in particular – successfully tested in both neighbourhood behaviour and QSAR modelling challenges, with very promising results. They showed excellent results in similarity‐based virtual screening for analogue protease inhibitors, and generated highly predictive octanol‐water partition coefficient and hERG channel inhibition models.

[1]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[2]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[3]  Gerta Rücker,et al.  Counts of all walks as atomic and molecular descriptors , 1993, J. Chem. Inf. Comput. Sci..

[4]  Marina Lasagni,et al.  New molecular descriptors for 2D and 3D structures. Theory , 1994 .

[5]  Thierry Convard,et al.  SmilogP: A Program for a Fast Evaluation of Theoretical Log P from the Smiles Code of a Molecule , 1994 .

[6]  Schmid,et al.  "Scaffold-Hopping" by Topological Pharmacophore Search: A Contribution to Virtual Screening. , 1999, Angewandte Chemie.

[7]  Alexandre Varnek,et al.  Modeling of Ion Complexation and Extraction Using Substructural Molecular Fragments , 2000, J. Chem. Inf. Comput. Sci..

[8]  A. Good,et al.  3-D pharmacophores in drug discovery. , 2001, Current pharmaceutical design.

[9]  Igor V. Tetko,et al.  Internet Software for the Calculation of the Lipophilicity and Aqueous Solubility of Chemical Compounds , 2001, J. Chem. Inf. Comput. Sci..

[10]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[11]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[12]  D. Horvath,et al.  Neighborhood behavior. Fuzzy molecular descriptors and their influence on the relationship between structural similarity and property similarity , 2003 .

[13]  Dragos Horvath,et al.  Neighborhood Behavior of in Silico Structural Spaces with Respect to in Vitro Activity Spaces-A Novel Understanding of the Molecular Similarity Principle in the Context of Multiple Receptor Binding Profiles , 2003, J. Chem. Inf. Comput. Sci..

[14]  Dragos Horvath,et al.  Neighborhood Behavior of in Silico Structural Spaces with Respect to In Vitro Activity Spaces-A Benchmark for Neighborhood Behavior Assessment of Different in Silico Similarity Metrics , 2003, J. Chem. Inf. Comput. Sci..

[15]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[16]  A. Varnek,et al.  Structure—property modeling of metal binders using molecular fragments , 2004 .

[17]  A. Brown,et al.  Drugs, hERG and sudden death. , 2004, Cell calcium.

[18]  Alexandre Varnek,et al.  Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures , 2005, J. Comput. Aided Mol. Des..

[19]  H. Mewes,et al.  Can we estimate the accuracy of ADME-Tox predictions? , 2006, Drug discovery today.

[20]  Gerald M. Maggiora,et al.  On Outliers and Activity Cliffs-Why QSAR Often Disappoints , 2006, J. Chem. Inf. Model..

[21]  Benjamin Parent,et al.  Fuzzy Tricentric Pharmacophore Fingerprints, 1. Topological Fuzzy Pharmacophore Triplets and Adapted Molecular Similarity Scoring Schemes , 2006, J. Chem. Inf. Model..

[22]  Jean-Philippe Vert,et al.  Consistency and Convergence Rates of One-Class SVMs and Related Algorithms , 2006, J. Mach. Learn. Res..

[23]  M. Sanguinetti,et al.  hERG potassium channels and cardiac arrhythmia , 2006, Nature.

[24]  Gisbert Schneider,et al.  Scaffold‐Hopping: How Far Can You Jump? , 2006 .

[25]  Andreas Bender,et al.  Flexible 3D pharmacophores as descriptors of dynamic biological space. , 2007, Journal of molecular graphics & modelling.

[26]  Olivier Sperandio,et al.  MED-SuMoLig: A New Ligand-Based Screening Tool for Efficient Scaffold Hopping , 2007, J. Chem. Inf. Model..

[27]  Petra Schneider,et al.  Scaffold Hopping by “Fuzzy” Pharmacophores and its Application to RNA Targets , 2007, Chembiochem : a European journal of chemical biology.

[28]  Jürgen Bajorath,et al.  Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. , 2007, Drug discovery today.

[29]  D. Fourches,et al.  Successful “In Silico” Design of New Efficient Uranyl Binders , 2007 .

[30]  B Montgomery Pettitt,et al.  The dewetting transition and the hydrophobic effect. , 2007, Journal of the American Chemical Society.

[31]  Alexandre Varnek,et al.  Stochastic versus Stepwise Strategies for Quantitative Structure-Activity Relationship GenerationHow Much Effort May the Mining for Successful QSAR Models Take? , 2007, J. Chem. Inf. Model..

[32]  I. Tetko,et al.  ISIDA - Platform for Virtual Screening Based on Fragment and Pharmacophoric Descriptors , 2008 .

[33]  Igor V. Tetko,et al.  Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection , 2008, J. Chem. Inf. Model..

[34]  Olivier Sperandio,et al.  Combining Ligand- and Structure-Based Methods in Drug Design Projects , 2008 .

[35]  Rajarshi Guha,et al.  Structure-Activity Landscape Index: Identifying and Quantifying Activity Cliffs , 2008, J. Chem. Inf. Model..

[36]  Stephen Muggleton,et al.  Scaffold Hopping in Drug Discovery Using Inductive Logic Programming , 2008, J. Chem. Inf. Model..

[37]  Eugen Lounkine,et al.  Random molecular fragment methods in computational medicinal chemistry. , 2008, Current medicinal chemistry.

[38]  Artem Cherkasov,et al.  Using Molecular Docking, 3D-QSAR, and Cluster Analysis for Screening Structurally Diverse Data Sets of Pharmacological Interest , 2008, J. Chem. Inf. Model..

[39]  Dragos Horvath,et al.  Fuzzy Tricentric Pharmacophore Fingerprints. 2. Application of Topological Fuzzy Pharmacophore Triplets in Quantitative Structure-Activity Relationships , 2008, J. Chem. Inf. Model..

[40]  Arthur M. Doweyko,et al.  QSAR: dead or alive? , 2008, J. Comput. Aided Mol. Des..

[41]  Tudor I. Oprea,et al.  hERG classification model based on a combination of support vector machine method and GRIND descriptors. , 2008, Molecular pharmaceutics.

[42]  R. Mannhold,et al.  Calculation of molecular lipophilicity: state of the art and comparison of methods on more than 96000 compounds , 2009, Journal of pharmaceutical sciences.

[43]  Visakan Kadirkamanathan,et al.  Analysis of Neighborhood Behavior in Lead Optimization and Array Design , 2009, J. Chem. Inf. Model..

[44]  Jia Jia,et al.  Comparative analysis of machine learning methods in ligand-based virtual screening of large compound libraries. , 2009, Combinatorial chemistry & high throughput screening.

[45]  J. Frearson,et al.  HTS and hit finding in academia – from chemical genomics to drug discovery , 2009, Drug discovery today.

[46]  Andreas Bender,et al.  Prospective Validation of a Comprehensive In silico hERG Model and its Applications to Commercial Compound and Drug Databases , 2010, ChemMedChem.

[47]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[48]  Gilles Marcou,et al.  Local neighborhood behavior in a combinatorial library context , 2011, J. Comput. Aided Mol. Des..