Chemical Data Formats, Fingerprints, and Other Molecular Descriptions for Database Analysis and Searching

In this chapter we strive to provide a comprehensive but reasonably compact overview of the various possibilities for the computational representation of molecules. This includes a detailed introduction to the most commonly used chemical file formats (complemented with a few novel or more specific representations), a thorough overview of the theoretical backgrounds of various molecular fingerprints and descriptors, and a complete section devoted to similarity measures and data fusion approaches. Finally, we provide a list of the most important online chemical databases and conclude the chapter with a short outlook on present trends and future expectations.

[1]  I. Gutman,et al.  Graph theory and molecular orbitals. XII. Acyclic polyenes , 1975 .

[2]  T. Fujita,et al.  Structure-activity study of phenethylamines as substrates of biosynthetic enzymes of sympathetic transmitters. , 1971, Journal of medicinal chemistry.

[3]  Yegor Zyrianov Distribution-Based Descriptors of the Molecular Shape , 2005, J. Chem. Inf. Model..

[4]  R. Mannhold,et al.  Calculation of molecular lipophilicity: state of the art and comparison of methods on more than 96000 compounds , 2009, Journal of pharmaceutical sciences.

[5]  Zeshui Xu,et al.  Distance and similarity measures for dual hesitant fuzzy sets and their applications in pattern recognition , 2015, J. Intell. Fuzzy Syst..

[6]  Robert C. Glen,et al.  Random Forest Models To Predict Aqueous Solubility , 2007, J. Chem. Inf. Model..

[7]  Valerie J. Gillet,et al.  Knowledge-Based Approach to de Novo Design Using Reaction Vectors , 2009, J. Chem. Inf. Model..

[8]  Milan Randić,et al.  A graph theoretical approach to structure-property and structure-activity correlations , 1980 .

[9]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[10]  F. Burden Molecular identification number for substructure searches , 1989, J. Chem. Inf. Comput. Sci..

[11]  Anna Vulpetti,et al.  Design and NMR-based screening of LEF, a library of chemical fragments with different local environment of fluorine. , 2009, Journal of the American Chemical Society.

[12]  L. Hammett The Effect of Structure upon the Reactions of Organic Compounds. Benzene Derivatives , 1937 .

[13]  Peter C. Jurs,et al.  Descriptions of molecular shape applied in studies of structure/activity and structure/property relationships , 1987 .

[14]  Arzucan Özgür,et al.  A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction , 2016, BMC Bioinformatics.

[15]  Egon L. Willighagen,et al.  The Blue Obelisk—Interoperability in Chemical Informatics , 2006, J. Chem. Inf. Model..

[16]  Thomas Engel Representation of Chemical Compounds , 2004 .

[17]  G. Maggiora,et al.  Molecular similarity in medicinal chemistry. , 2014, Journal of medicinal chemistry.

[18]  Allan M. Ferguson,et al.  EVA: A new theoretically based molecular descriptor for use in QSAR/QSPR analysis , 1997, J. Comput. Aided Mol. Des..

[19]  J. Mason,et al.  New 4-point pharmacophore method for molecular similarity and diversity applications: overview of the method and applications, including a novel approach to the design of combinatorial libraries containing privileged substructures. , 1999, Journal of medicinal chemistry.

[20]  Chenzhong Cao,et al.  Molecular Electronegative Distance Vector (MEDV) Related to 15 Properties of Alkanes , 2000, J. Chem. Inf. Comput. Sci..

[21]  Anton J. Hopfinger,et al.  4D-QSAR: Perspectives in Drug Design , 2010, Molecules.

[22]  Alexandru T. Balaban,et al.  Using real numbers as vertex invariants for third-generation topological indexes , 1992, J. Chem. Inf. Comput. Sci..

[23]  Peter Willett,et al.  The Use of Weighted 2D Fingerprints in Similarity-Based Virtual Screening , 2015 .

[24]  Y. Martin,et al.  Do structurally similar molecules have similar biological activity? , 2002, Journal of medicinal chemistry.

[25]  K. Héberger,et al.  Chromatographic and computational assessment of lipophilicity using sum of ranking differences and generalized pair-correlation. , 2015, Journal of chromatography. A.

[26]  Lirong Wang,et al.  TargetHunter: An In Silico Target Identification Tool for Predicting Therapeutic Potential of Small Organic Molecules Based on Chemogenomic Database , 2013, The AAPS Journal.

[27]  Maykel Cruz-Monteagudo,et al.  Activity cliffs in drug discovery: Dr Jekyll or Mr Hyde? , 2014, Drug discovery today.

[28]  Max Dobler,et al.  Multidimensional QSAR: Moving from three‐ to five‐dimensional concepts , 2002 .

[29]  Ricardo J. G. B. Campello,et al.  On the selection of appropriate distances for gene expression data clustering , 2014, BMC Bioinformatics.

[30]  Terry R. Stouch,et al.  A simple method for the representation, quantification, and comparison of the volumes and shapes of chemical compounds , 1986, J. Chem. Inf. Comput. Sci..

[31]  H. Nielsen,et al.  Data fusion in metabolomic cancer diagnostics , 2012, Metabolomics.

[32]  David Rogers,et al.  Cheminformatics analysis and learning in a data pipelining environment , 2006, Molecular Diversity.

[33]  J. Dearden,et al.  Design of new cognition enhancers: from computer prediction to synthesis and biological evaluation. , 2004, Journal of medicinal chemistry.

[34]  Adrià Cereto-Massagué,et al.  Molecular fingerprint similarity search in virtual screening. , 2015, Methods.

[35]  Gustavo A. Arteca,et al.  Molecular Shape Descriptors , 2007 .

[36]  Alexandru T. Balaban Local versus Global (i.e. Atomic versus Molecular) Numerical Modeling of Molecular Graphs , 1994, J. Chem. Inf. Comput. Sci..

[37]  G. Schneider,et al.  Mapping Chemical Structures to Markush Structures Using SMIRKS , 2011, Molecular informatics.

[38]  John MacCuish,et al.  Chemoinformatics applications of cluster analysis , 2014 .

[39]  Xin Chen,et al.  Performance of Similarity Measures in 2D Fragment-Based Similarity Searching: Comparison of Structural Descriptors and Similarity Coefficients , 2002, J. Chem. Inf. Comput. Sci..

[40]  Jörg K. Wegner,et al.  Molecular Query Language (MQL)A Context-Free Grammar for Substructure Matching , 2007, J. Chem. Inf. Model..

[41]  Sereina Riniker,et al.  Open-source platform to benchmark fingerprints for ligand-based virtual screening , 2013, Journal of Cheminformatics.

[42]  K. Baumann,et al.  Chemoinformatic Classification Methods and their Applicability Domain , 2016, Molecular informatics.

[43]  Andreas Bender,et al.  "Bayes Affinity Fingerprints" Improve Retrieval Rates in Virtual Screening and Define Orthogonal Bioactivity Space: When Are Multitarget Drugs a Feasible Concept? , 2006, J. Chem. Inf. Model..

[44]  Malcolm J. McGregor,et al.  Pharmacophore Fingerprinting. 2. Application to Primary Library Design , 2000, J. Chem. Inf. Comput. Sci..

[45]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[46]  Andreas Bender,et al.  Similarity Searching of Chemical Databases Using Atom Environment Descriptors (MOLPRINT 2D): Evaluation of Performance , 2004, J. Chem. Inf. Model..

[47]  Manabu Kano,et al.  Development of soft-sensor using locally weighted PLS with adaptive similarity measure , 2013 .

[48]  A. Schuffenhauer,et al.  Chemical diversity and biological activity , 2006 .

[49]  Shushen Liu,et al.  A Novel MHDV Descriptor for Dipeptide QSAR Studies , 2001 .

[50]  Lazaros Mavridis,et al.  Detecting Drug Promiscuity Using Gaussian Ensemble Screening , 2012, J. Chem. Inf. Model..

[51]  H. Kubinyi Quantitative structure-activity relationships. 2. A mixed approach, based on Hansch and Free-Wilson Analysis. , 1976, Journal of medicinal chemistry.

[52]  Tina Ritschel,et al.  Pharmacophore Fingerprint-Based Approach to Binding Site Subpocket Similarity and Its Application to Bioisostere Replacement , 2012, J. Chem. Inf. Model..

[53]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[54]  G. S. Johnson,et al.  An Information-Intensive Approach to the Molecular Pharmacology of Cancer , 1997, Science.

[55]  K. Héberger,et al.  Towards better understanding of lipophilicity: assessment of in silico and chromatographic logP measures for pharmaceutically important compounds by nonparametric rankings. , 2015, Journal of pharmaceutical and biomedical analysis.

[56]  Tao Li,et al.  Some Similarity Measures for Triangular Fuzzy Number and Their Applications in Multiple Criteria Group Decision-Making , 2013, J. Appl. Math..

[57]  Károly Héberger,et al.  Sum of ranking differences (SRD) to ensemble multivariate calibration model merits for tuning parameter selection and comparing calibration methods. , 2015, Analytica chimica acta.

[58]  Edward E. Hodgkin,et al.  Molecular similarity based on electrostatic potential and electric field , 1987 .

[59]  A. Leo CALCULATING LOG POCT FROM STRUCTURES , 1993 .

[60]  David Vidal,et al.  LINGO, an Efficient Holographic Text Based Method To Calculate Biophysical Properties and Intermolecular Similarities , 2005, J. Chem. Inf. Model..

[61]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[62]  Woody Sherman,et al.  Analysis and comparison of 2D fingerprints: insights into database screening performance using eight fingerprint methods , 2010, J. Cheminformatics.

[63]  Andrey M. Kazennov,et al.  Impact of distance-based metric learning on classification and visualization model performance and structure–activity landscapes , 2014, Journal of Computer-Aided Molecular Design.

[64]  Egon L. Willighagen,et al.  OSCAR4: a flexible architecture for chemical text-mining , 2011, J. Cheminformatics.

[65]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[66]  C. Hansch,et al.  p-σ-π Analysis. A Method for the Correlation of Biological Activity and Chemical Structure , 1964 .

[67]  Brown Rd,et al.  An Evaluation of Structural Descriptors and Clustering Methods for Use in Diversity Selection , 1998 .

[68]  Xian Zhang,et al.  Benchmarking of Multivariate Similarity Measures for High-Content Screening Fingerprints in Phenotypic Drug Discovery , 2013, Journal of biomolecular screening.

[69]  Christopher Southan,et al.  Extracting and connecting chemical structures from text sources using chemicalize.org , 2013, Journal of Cheminformatics.

[70]  Henry S. Rzepa,et al.  Chemical Markup, XML, and the Worldwide Web. 1. Basic Principles , 1999, J. Chem. Inf. Comput. Sci..

[71]  L. Hammett,et al.  Some Relations between Reaction Rates and Equilibrium Constants. , 1935 .

[72]  Jürgen Bajorath,et al.  Maximum common substructure-based Tversky index: an asymmetric hybrid similarity measure , 2016, Journal of Computer-Aided Molecular Design.

[73]  Ramon Carbo,et al.  How similar is a molecule to another? An electron density measure of similarity between two molecular structures , 1980 .

[74]  Ruriko Yoshida,et al.  A support vector machine based test for incongruence between sets of trees in tree space , 2012, BMC Bioinformatics.

[75]  Jürgen Bajorath,et al.  Molecular Fingerprint Recombination: Generating Hybrid Fingerprints for Similarity Searching from Different Fingerprint Types , 2009, ChemMedChem.

[76]  Ian A. Watson,et al.  ErG: 2D Pharmacophore Descriptions for Scaffold Hopping , 2006, J. Chem. Inf. Model..

[77]  David A. Cosgrove,et al.  Markush Structures and Chemical Patents , 2013 .

[78]  Prasenjit Mukherjee,et al.  An overview of molecular fingerprint similarity search in virtual screening , 2016, Expert opinion on drug discovery.

[79]  Jürgen Bajorath,et al.  Design and Evaluation of a Molecular Fingerprint Involving the Transformation of Property Descriptor Values into a Binary Classification Scheme , 2003, J. Chem. Inf. Comput. Sci..

[80]  A. Peter Johnson,et al.  CLiDE Pro: The Latest Generation of CLiDE, a Tool for Optical Chemical Structure Recognition , 2009, J. Chem. Inf. Model..

[81]  Marc C. Nicklaus,et al.  Comparison of Nine Programs Predicting pKa Values of Pharmaceutical Substances , 2009, J. Chem. Inf. Model..

[82]  Stephen R. Heller,et al.  InChI - the worldwide chemical structure identifier standard , 2013, Journal of Cheminformatics.

[83]  Olivier Michielin,et al.  Shaping the interaction landscape of bioactive molecules , 2013, Bioinform..

[84]  J. A. Grant,et al.  A fast method of molecular shape comparison: A simple application of a Gaussian description of molecular shape , 1996, J. Comput. Chem..

[85]  A. Hopfinger,et al.  Construction of 3D-QSAR Models Using the 4D-QSAR Analysis Formalism , 1997 .

[86]  Obdulia Rabal,et al.  APIF: A New Interaction Fingerprint Based on Atom Pairs and Its Application to Virtual Screening , 2009, J. Chem. Inf. Model..

[87]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[88]  Roger A. Sayle,et al.  Get Your Atoms in Order - An Open-Source Implementation of a Novel and Robust Molecular Canonicalization Algorithm , 2015, J. Chem. Inf. Model..

[89]  Andreas Barth,et al.  A Novel Concept for the Search and Retrieval of the Derwent Markush Resource Database , 2016, J. Chem. Inf. Model..

[90]  J. A. Grant,et al.  A shape-based 3-D scaffold hopping method and its application to a bacterial protein-protein interaction. , 2005, Journal of medicinal chemistry.

[91]  U. Lessel,et al.  In vitro and in silico affinity fingerprints: Finding similarities beyond structural classes , 2000 .

[92]  Johann Gasteiger,et al.  New Publicly Available Chemical Query Language, CSRML, To Support Chemotype Representations for Application to Data Mining and Modeling , 2015, J. Chem. Inf. Model..

[93]  P. Carrupt,et al.  Molecular fields in quantitative structure–permeation relationships: the VolSurf approach , 2000 .

[94]  Jean-Louis Reymond,et al.  A Searchable Map of PubChem , 2010, J. Chem. Inf. Model..

[95]  Sereina Riniker,et al.  Similarity maps - a visualization strategy for molecular fingerprints and machine-learning methods , 2013, Journal of Cheminformatics.

[96]  Robert D. Clark,et al.  Structural Unit Analysis Identifies Lead Series and Facilitates Scaffold Hopping in Combinatorial Chemistry , 2006, J. Chem. Inf. Model..

[97]  Pushpinder Singh,et al.  A new method for solving dual hesitant fuzzy assignment problems with restrictions based on similarity measure , 2014, Appl. Soft Comput..

[98]  David A. Cosgrove,et al.  Lead Hopping Using SVM and 3D Pharmacophore Fingerprints , 2005, J. Chem. Inf. Model..

[99]  Naomie Salim,et al.  Analysis and Display of the Size Dependence of Chemical Similarity Coefficients , 2003, J. Chem. Inf. Comput. Sci..

[100]  A. J. Batista-Leyva,et al.  Formal theory of the comparative relations: its application to the study of quantum similarity and dissimilarity measures and indices , 2010 .

[101]  A. Balaban,et al.  New vertex invariants and topological indices of chemical graphs based on information on distances , 1991 .

[102]  Mohammed Mumtaz Al-Dabbagh,et al.  Adapting Document Similarity Measures for Ligand-Based Virtual Screening , 2016, Molecules.

[103]  K. Varmuza,et al.  Spectral similarity versus structural similarity: infrared spectroscopy , 2003 .

[105]  Peter Willett,et al.  Combination of Similarity Rankings Using Data Fusion , 2013, J. Chem. Inf. Model..

[106]  G. Keserü Prediction of hERG potassium channel affinity by traditional and hologram qSAR methods. , 2003, Bioorganic & medicinal chemistry letters.

[107]  Stephen R. Heller,et al.  InChI, the IUPAC International Chemical Identifier , 2015, Journal of Cheminformatics.

[108]  Naomie Salim,et al.  Combination of Fingerprint-Based Similarity Coefficients Using Data Fusion , 2003, J. Chem. Inf. Comput. Sci..

[109]  R. Cramer,et al.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.

[110]  Wei Deng,et al.  Intuitive Patent Markush Structure Visualization Tool for Medicinal Chemists , 2011, J. Chem. Inf. Model..

[111]  P. Beroza,et al.  Chemoproteomics as a basis for post-genomic drug discovery. , 2002, Drug discovery today.

[112]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[113]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[114]  L. Kier Distinguishing Atom Differences in a Molecular Graph Shape Index , 1986 .

[115]  Jean-Louis Reymond,et al.  Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 , 2012, J. Chem. Inf. Model..

[116]  Naomie Salim,et al.  Chemical named entities recognition: a review on approaches and applications , 2014, Journal of Cheminformatics.

[117]  P. Willett,et al.  Combination of molecular similarity measures using data fusion , 2000 .

[118]  Cesar H. Comin,et al.  A Systematic Comparison of Supervised Classifiers , 2013, PloS one.

[119]  P Willett,et al.  Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. , 2002, Combinatorial chemistry & high throughput screening.

[120]  Zeshui Xu,et al.  Intuitionistic and interval-valued intutionistic fuzzy preference relations and their measures of similarity for the evaluation of agreement within a group , 2009, Fuzzy Optim. Decis. Mak..

[121]  Jaap Heringa,et al.  Electron Density Fingerprints (EDprints): Virtual Screening Using Assembled Information of Electron Density , 2010, J. Chem. Inf. Model..

[122]  S. Pickett,et al.  GRid-INdependent descriptors (GRIND): a novel class of alignment-independent three-dimensional molecular descriptors. , 2000, Journal of medicinal chemistry.

[123]  R. Wehrens,et al.  A generalized expression for the similarity of spectra: application to powder diffraction pattern classification , 2001, J. Comput. Chem..

[124]  Bin Chen,et al.  Predicting drug target interactions using meta-path-based semantic network analysis , 2016, BMC Bioinformatics.

[125]  Lemont B. Kier,et al.  The electrotopological state: structure information at the atomic level for molecular graphs , 1991, J. Chem. Inf. Comput. Sci..

[126]  James G. Nourse,et al.  Structure searching in chemical databases by direct lookup methods , 1993, J. Chem. Inf. Comput. Sci..

[127]  Mohamed F. Ghalwash,et al.  Structured feature selection using coordinate descent optimization , 2016, BMC Bioinformatics.

[128]  Pierre Baldi,et al.  Graph kernels for chemical informatics , 2005, Neural Networks.

[129]  Miin-Shen Yang,et al.  Similarity measures of intuitionistic fuzzy sets based on Hausdorff distance , 2004, Pattern Recognit. Lett..

[130]  Robert P. Sheridan,et al.  Molecular Transformations as a Way of Finding and Exploiting Consistent Local QSAR , 2006, J. Chem. Inf. Model..

[131]  Ernesto Estrada,et al.  From molecular graphs to drugs. A review on the use of topological indices in drug design and discovery , 2003 .

[132]  K. Tuppurainen EEVA (Electronic Eigenvalue): A New QSAR/QSPR Descriptor for Electronic Substituent Effects Based on Molecular Orbital Energies , 1999 .

[133]  W. Graham Richards,et al.  Ultrafast shape recognition to search compound databases for similar molecular shapes , 2007, J. Comput. Chem..

[134]  Sudeepto Bhattacharya,et al.  Network Measures for Chemical Library Design , 2014, Drug development research.

[135]  Peter Ertl,et al.  Intuitive Ordering of Scaffolds and Scaffold Similarity Searching Using Scaffold Keys , 2014, J. Chem. Inf. Model..

[136]  Neera Jain,et al.  Prediction of Aqueous Solubility of Organic Compounds by the General Solubility Equation (GSE) , 2001, J. Chem. Inf. Comput. Sci..

[137]  Henry S. Rzepa,et al.  Chemical Markup, XML and the World-Wide Web. 2. Information Objects and the CMLDOM , 2001, J. Chem. Inf. Comput. Sci..

[138]  Ramaswamy Nilakantan,et al.  Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors , 1987, J. Chem. Inf. Comput. Sci..

[139]  Jérôme Hert,et al.  New Methods for Ligand-Based Virtual Screening: Use of Data Fusion and Machine Learning to Enhance the Effectiveness of Similarity Searching , 2006, J. Chem. Inf. Model..

[140]  I. D. de Esch,et al.  KLIFS: a knowledge-based structural database to navigate kinase-ligand interaction space. , 2014, Journal of medicinal chemistry.

[141]  Miklos Feher,et al.  Novel 2D Fingerprints for Ligand-Based Virtual Screening , 2006, J. Chem. Inf. Model..

[142]  Mark S. Johnson,et al.  Generating Conformer Ensembles Using a Multiobjective Genetic Algorithm , 2007, J. Chem. Inf. Model..

[143]  Luca Toldo,et al.  Challenges in mining the literature for chemical information , 2013 .

[144]  J. Dearden,et al.  QSAR modeling: where have you been? Where are you going to? , 2014, Journal of medicinal chemistry.

[145]  Ronan Bureau,et al.  The Maximum Common Substructure as a Molecular Depiction in a Supervised Classification Context: Experiments in Quantitative Structure/Biodegradability Relationships , 2002, J. Chem. Inf. Comput. Sci..

[146]  Charles Tanford,et al.  Physical Chemistry of Macromolecules , 1961 .

[147]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[148]  Douglas Rutledge,et al.  Data fusion between high resolution 1H-NMR and mass spectrometry: a synergetic approach to honey botanical origin characterization , 2016, Analytical and Bioanalytical Chemistry.

[149]  S. Bhattacharjee,et al.  Molecular Property Correlation in Haloethanes with Geometric Volume , 1992, Comput. Chem..

[150]  Claudio Chuaqui,et al.  Structural Interaction Fingerprints: A New Approach to Organizing, Mining, Analyzing, and Designing Protein–Small Molecule Complexes , 2006, Chemical biology & drug design.

[151]  Roberto Todeschini,et al.  New Similarity Coefficients for Binary Data , 2012 .

[152]  Zeshui Xu,et al.  Novel distance and similarity measures on hesitant fuzzy sets with applications to clustering analysis , 2015, J. Intell. Fuzzy Syst..

[153]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[154]  A. Hopfinger A QSAR investigation of dihydrofolate reductase inhibition by Baker triazines based upon molecular shape analysis , 1980 .

[155]  Paolo Benedetti,et al.  FLAP: GRID Molecular Interaction Fields in Virtual Screening. Validation using the DUD Data Set , 2010, J. Chem. Inf. Model..

[156]  Wolfgang H. B. Sauer,et al.  Molecular Shape Diversity of Combinatorial Libraries: A Prerequisite for Broad Bioactivity , 2003, J. Chem. Inf. Comput. Sci..

[157]  W. Graham Richards,et al.  Improving the accuracy of ultrafast ligand-based screening: incorporating lipophilicity into ElectroShape as an extra dimension , 2011, J. Comput. Aided Mol. Des..

[158]  Jens Meiler,et al.  Improving quantitative structure–activity relationship models using Artificial Neural Networks trained with dropout , 2016, Journal of Computer-Aided Molecular Design.

[159]  Xiang Yu,et al.  Target enhanced 2D similarity search by using explicit biological activity annotations and profiles , 2015, Journal of Cheminformatics.

[160]  Christopher I. Bayly,et al.  Evaluating Virtual Screening Methods: Good and Bad Metrics for the "Early Recognition" Problem , 2007, J. Chem. Inf. Model..

[161]  Tudor I. Oprea,et al.  Impact of similarity threshold on the topology of molecular similarity networks and clustering outcomes , 2016, Journal of Cheminformatics.

[162]  Kunal Roy,et al.  A Primer on QSAR/QSPR Modeling: Fundamental Concepts , 2015 .

[163]  M. Brysbaert,et al.  Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting : A review and empirical validation , 2017 .

[164]  Jean-Louis Reymond,et al.  SMIfp (SMILES fingerprint) Chemical Space for Virtual Screening and Visualization of Large Databases of Organic Molecules , 2013, J. Chem. Inf. Model..

[165]  Kathrin Heikamp,et al.  Fingerprint design and engineering strategies: rationalizing and improving similarity search performance. , 2012, Future medicinal chemistry.

[166]  Hugo Kubinyi A General View on Similarity and QSAR Studies , 2007 .

[167]  K. Héberger,et al.  Method and model comparison by sum of ranking differences in cases of repeated observations (ties) , 2013 .

[168]  Gary Tresadern,et al.  A comparison of ligand based virtual screening methods and application to corticotropin releasing factor 1 receptor. , 2009, Journal of molecular graphics & modelling.

[169]  N. Nikolova,et al.  International Union of Pure and Applied Chemistry, LUMO energy ± The Lowest Unoccupied Molecular Orbital (LUMO) , 2022 .

[170]  David Zhang,et al.  MetricFusion: Generalized metric swarm learning for similarity measure , 2016, Inf. Fusion.

[171]  Henry S. Rzepa,et al.  Chemical Markup, XML, and the World Wide Web. 4. CML Schema , 2003, J. Chem. Inf. Comput. Sci..

[172]  Brian McMahon,et al.  CIF: the computer language of crystallography. , 2002, Acta crystallographica. Section B, Structural science.

[173]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[174]  Jyrki Taskinen,et al.  Aqueous Solubility Prediction of Drugs Based on Molecular Topology and Neural Network Modeling , 1998, J. Chem. Inf. Comput. Sci..

[175]  Klaus Obermayer,et al.  A Maximum Common Subgraph Kernel Method for Predicting the Chromosome Aberration Test , 2010, J. Chem. Inf. Model..

[176]  Jahan B. Ghasemi,et al.  QSPR prediction of aqueous solubility of drug-like organic compounds. , 2007, Chemical & pharmaceutical bulletin.

[177]  R. Stevens,et al.  Crystal structure-based virtual screening for fragment-like ligands of the human histamine H(1) receptor. , 2011, Journal of medicinal chemistry.

[178]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[179]  Károly Héberger,et al.  Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? , 2015, Journal of Cheminformatics.

[180]  A. Vedani,et al.  Combining protein modeling and 6D-QSAR. Simulating the binding of structurally diverse ligands to the estrogen receptor. , 2005, Journal of medicinal chemistry.

[181]  Jürgen Bajorath,et al.  Combinatorial Preferences Affect Molecular Similarity/Diversity Calculations Using Binary Fingerprints and Tanimoto Coefficients , 2000, J. Chem. Inf. Comput. Sci..

[182]  A J Hopfinger,et al.  Three-dimensional molecular shape analysis-quantitative structure-activity relationship of a series of cholecystokinin-A receptor antagonists. , 1994, Journal of medicinal chemistry.

[183]  Rajarshi Guha,et al.  Structure-Activity Landscape Index: Identifying and Quantifying Activity Cliffs , 2008, J. Chem. Inf. Model..

[184]  Peter Murray-Rust,et al.  Chemical Name to Structure: OPSIN, an Open Source Solution , 2011, J. Chem. Inf. Model..

[185]  Robert P. Sheridan,et al.  Chemical Similarity Using Physiochemical Property Descriptors , 1996, J. Chem. Inf. Comput. Sci..

[186]  Egon L. Willighagen,et al.  Chemical Markup, XML, and the World Wide Web, 7. CMLSpect, an XML Vocabulary for Spectral Data , 2007, J. Chem. Inf. Model..

[187]  Ronald Rousseau,et al.  Similarity measures in scientometric research: The Jaccard index versus Salton's cosine formula , 1989, Inf. Process. Manag..

[188]  K. Héberger,et al.  Consistency of QSAR models: Correct split of training and test sets, ranking of models and performance parameters† , 2015, SAR and QSAR in environmental research.

[189]  J. D. Petke Cumulative and discrete similarity analysis of electrostatic potentials and fields , 1993, J. Comput. Chem..

[190]  John H. Kalivas,et al.  Fusion strategies for selecting multiple tuning parameters for multivariate calibration and other penalty based processes: A model updating application for pharmaceutical analysis. , 2016, Analytica chimica acta.

[191]  Noel M. O'Boyle,et al.  Cinfony – combining Open Source cheminformatics toolkits behind a common interface , 2008, Chemistry Central journal.

[192]  Chun-Hou Zheng,et al.  A large scale test dataset to determine optimal retention index threshold based on three mass spectral similarity measures. , 2012, Journal of chromatography. A.

[193]  Woody Sherman,et al.  Large-Scale Systematic Analysis of 2D Fingerprint Methods and Parameters to Improve Virtual Screening Enrichments , 2010, J. Chem. Inf. Model..

[194]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[195]  A. Jenkins,et al.  Source-based nomenclature for copolymers (Recommendation 1985): International Union of Pure and Applied Chemistry (IUPAC) Macromolecular Division Commission on Macromolecular Nomenclature , 1986 .

[196]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[197]  Steven L. Dixon,et al.  Bioactive Diversity and Screening Library Selection via Affinity Fingerprinting , 1998, J. Chem. Inf. Comput. Sci..

[198]  Dong-Sheng Cao,et al.  ChemDes: an integrated web-based platform for molecular descriptor and fingerprint computation , 2015, Journal of Cheminformatics.

[199]  Janusz Kacprzyk,et al.  A Similarity Measure for Intuitionistic Fuzzy Sets and Its Application in Supporting Medical Diagnostic Reasoning , 2004, ICAISC.

[200]  M. Abraham,et al.  The use of characteristic volumes to measure cavity terms in reversed phase liquid chromatography , 1987 .

[201]  Jiangbing Li,et al.  Comparative analysis of volatiles difference of Yunnan sun-dried Pu-erh green tea from different tea mountains: Jingmai and Wuliang mountain by chemical fingerprint similarity combined with principal component analysis and cluster analysis , 2016, Chemistry Central Journal.

[202]  Ralf Laue,et al.  A comparative survey of business process similarity measures , 2012, Comput. Ind..

[203]  Lemont B. Kier,et al.  Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information , 1995, J. Chem. Inf. Comput. Sci..

[204]  P Willett,et al.  Comparison of algorithms for dissimilarity-based compound selection. , 1997, Journal of molecular graphics & modelling.

[205]  Henry S. Rzepa,et al.  Chemical Markup, XML and the World-Wide Web. 8. Polymer Markup Language , 2008, J. Chem. Inf. Model..

[206]  Jürgen Bajorath,et al.  Database Searching for Compounds with Similar Biological Activity Using Short Binary Bit String Representations of Molecules , 1999, J. Chem. Inf. Comput. Sci..

[207]  Henry S. Rzepa,et al.  Chemical Markup, XML, and the World-Wide Web. 3. Toward a Signed Semantic Chemical Web of Trust , 2001, J. Chem. Inf. Comput. Sci..

[208]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[209]  Jürgen Bajorath,et al.  Reduction and Recombination of Fingerprints of Different Design Increase Compound Recall and the Structural Diversity of Hits , 2010, Chemical biology & drug design.

[210]  Peter Willett,et al.  Rapid Quantification of Molecular Diversity for Selective Database Acquisition , 1997, J. Chem. Inf. Comput. Sci..

[211]  Mohammed Mumtaz Al-Dabbagh,et al.  A Quantum-Based Similarity Method in Virtual Screening , 2015, Molecules.

[212]  George Karypis,et al.  Indirect Similarity Based Methods for Effective Scaffold-Hopping in Chemical Compounds , 2008, J. Chem. Inf. Model..

[213]  Jordi Mestres,et al.  Putting molecular similarity into context: asymmetric indices for field-based similarity measures , 2006 .

[214]  Ernesto Estrada,et al.  Edge Adjacency Relationships and a Novel Topological Index Related to Molecular Volume , 1995, J. Chem. Inf. Comput. Sci..

[215]  Pengfei Shi,et al.  Similarity measures on intuitionistic fuzzy sets , 2003, Pattern Recognit. Lett..

[216]  Peter Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[217]  Reino Laatikainen,et al.  Evaluation of a Novel Electronic Eigenvalue (EEVA) Molecular Descriptor for QSAR/QSPR Studies: Validation Using a Benchmark Steroid Data Set , 2002, J. Chem. Inf. Comput. Sci..

[218]  A. Valencia,et al.  Text Mining for Drugs and Chemical Compounds: Methods, Tools and Applications , 2011, Molecular informatics.

[219]  Lemont B. Kier,et al.  A Shape Index from Molecular Graphs , 1985 .

[220]  L. Hall,et al.  Molecular connectivity in chemistry and drug research , 1976 .

[221]  F. Allen,et al.  The crystallographic information file (CIF) : a new standard archive file for crystallography , 1991 .

[222]  J. Bajorath,et al.  Mini-fingerprints for virtual screening: Design principles and generation of novel prototypes based on information theory , 2003, SAR and QSAR in environmental research.

[223]  A. Balaban Highly discriminating distance-based topological index , 1982 .

[224]  Noel M. O'Boyle Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI , 2012, Journal of Cheminformatics.

[225]  Haruki Nakamura,et al.  PDBML: the representation of archival macromolecular structure data in XML , 2005, Bioinform..

[226]  Yvonne C. Martin,et al.  Application of Belief Theory to Similarity Data Fusion for Use in Analog Searching and Lead Hopping , 2008, J. Chem. Inf. Model..

[227]  David M. Rocke,et al.  Predicting ligand binding to proteins by affinity fingerprinting. , 1995, Chemistry & biology.

[228]  Robert P. Sheridan,et al.  Chemical Similarity Using Geometric Atom Pair Descriptors , 1996, J. Chem. Inf. Comput. Sci..

[229]  Changyong Liang,et al.  A collaborative filtering similarity measure based on potential field , 2016, Kybernetes.

[230]  E. Pretsch,et al.  A novel spectra similarity measure , 2007 .

[231]  Lemont B. Kier,et al.  An Electrotopological-State Index for Atoms in Molecules , 1990, Pharmaceutical Research.

[232]  Alex M. Clark,et al.  Accurate Specification of Molecular Structures: The Case for Zero-Order Bonds and Explicit Hydrogen Counting , 2011, J. Chem. Inf. Model..

[233]  Lemont B. Kier,et al.  Intermolecular Accessibility: The Meaning of Molecular Connectivity , 2000, J. Chem. Inf. Comput. Sci..

[234]  Z. Deng,et al.  Structural interaction fingerprint (SIFt): a novel method for analyzing three-dimensional protein-ligand binding interactions. , 2004, Journal of medicinal chemistry.

[235]  Malcolm J. McGregor,et al.  Pharmacophore Fingerprinting. 1. Application to QSAR and Focused Library Design , 1999, J. Chem. Inf. Comput. Sci..

[236]  M. Hahn Receptor surface models. 1. Definition and construction. , 1995, Journal of medicinal chemistry.

[237]  S. Free,et al.  A MATHEMATICAL CONTRIBUTION TO STRUCTURE-ACTIVITY STUDIES. , 1964, Journal of medicinal chemistry.

[238]  Zoran Obradovic,et al.  Computational Drug Repositioning by Ranking and Integrating Multiple Data Sources , 2013, ECML/PKDD.

[239]  Weiqiong Wang,et al.  Distance measure between intuitionistic fuzzy sets , 2005, Pattern Recognit. Lett..

[240]  Hwanjo Yu,et al.  Selective sampling techniques for feedback-based data retrieval , 2010, Data Mining and Knowledge Discovery.

[241]  Jon Winter,et al.  A System for Encoding and Searching Markush Structures , 2012, J. Chem. Inf. Model..

[242]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[243]  Stephen R. Heller,et al.  InChIKey collision resistance: an experimental testing , 2012, Journal of Cheminformatics.

[244]  Norbert Jankowski,et al.  Analysis of Feature Weighting Methods Based on Feature Ranking Methods for Classification , 2011, ICONIP.

[245]  Maciej Krawczak,et al.  On asymmetric matching between sets , 2015, Inf. Sci..

[246]  Andreas Bender,et al.  How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space , 2009, J. Chem. Inf. Model..

[247]  William J. Wiswesser,et al.  How the WLN began in 1949 and how it might be in 1999 , 1982, J. Chem. Inf. Comput. Sci..

[248]  P. Clemons,et al.  Chemogenomic data analysis: prediction of small-molecule targets and the advent of biological fingerprint. , 2007, Combinatorial chemistry & high throughput screening.

[249]  Lois E. Fritts,et al.  Using the Wiswesser line notation (WLN) for online, interactive searching of chemical structures , 1982, Journal of chemical information and computer sciences.

[250]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[251]  Gerald M. Maggiora,et al.  On Outliers and Activity Cliffs-Why QSAR Often Disappoints , 2006, J. Chem. Inf. Model..

[252]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[253]  Max Dobler,et al.  5D-QSAR: the key for simulating induced fit? , 2002, Journal of medicinal chemistry.

[254]  Roberto Todeschini,et al.  Distances and Other Dissimilarity Measures in Chemometrics , 2015 .

[255]  Marc C. Nicklaus,et al.  Enumeration of Ring–Chain Tautomers Based on SMIRKS Rules , 2014, J. Chem. Inf. Model..

[256]  Matthias Rarey,et al.  Feature trees: A new molecular similarity measure based on tree matching , 1998, J. Comput. Aided Mol. Des..

[257]  Andreas Zell,et al.  Optimization and visualization of the edge weights in optimal assignment methods for virtual screening , 2012, BioData Mining.

[258]  Jürgen Bajorath,et al.  Design and Evaluation of a Novel Class-Directed 2D Fingerprint to Search for Structurally Diverse Active Compounds , 2006, J. Chem. Inf. Model..

[259]  K. Héberger Sum of ranking differences compares methods or models fairly , 2010 .

[260]  Peter Willett,et al.  Combination Rules for Group Fusion in Similarity‐Based Virtual Screening , 2010, Molecular informatics.

[261]  Paolo Frasconi,et al.  Markov Logic Networks for Optical Chemical Structure Recognition , 2014, J. Chem. Inf. Model..

[262]  R. Webster Homer,et al.  SYBYL Line Notation (SLN): A Versatile Language for Chemical Structure Representation , 1997, J. Chem. Inf. Comput. Sci..

[263]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[264]  Peter Willett,et al.  Fusing similarity rankings in ligand-based virtual screening , 2013, Computational and structural biotechnology journal.

[265]  Dmitri B. Kireev,et al.  Structural Protein–Ligand Interaction Fingerprints (SPLIF) for Structure-Based Virtual Screening: Method and Benchmark Study , 2014, J. Chem. Inf. Model..

[266]  J. J. Vollmer Wiswesser Line Notation: An Introduction. , 1983 .

[267]  Gilles Marcou,et al.  Optimizing Fragment and Scaffold Docking by Use of Molecular Interaction Fingerprints , 2007, J. Chem. Inf. Model..

[268]  Supratik Mukhopadhyay,et al.  A graph-based approach to construct target-focused libraries for virtual screening , 2016, Journal of Cheminformatics.

[269]  Milan Randic,et al.  A New Descriptor for Structure-Property and Structure-Activity Correlations , 2001, J. Chem. Inf. Comput. Sci..

[270]  Lars Ridder,et al.  SyGMa: Combining Expert Knowledge and Empirical Scoring in the Prediction of Metabolites , 2008, ChemMedChem.

[271]  K. Héberger,et al.  Sum of ranking differences for method discrimination and its validation: comparison of ranks with random numbers , 2011 .

[272]  Amatzya Y. Meyer,et al.  Molecular mechanics and molecular shape. Part 1. van der Waals descriptors of simple molecules , 1985 .

[273]  S. L. Dixon,et al.  One-dimensional molecular representations and similarity calculations: methodology and validation. , 2001, Journal of medicinal chemistry.

[274]  John M. Barnard,et al.  Chemical Fragment Generation and Clustering Software , 1997, J. Chem. Inf. Comput. Sci..

[275]  Henry S. Rzepa,et al.  Chemical Markup, XML, and the World Wide Web. 6. CMLReact, an XML Vocabulary for Chemical Reactions , 2006, J. Chem. Inf. Model..

[276]  Robert P. Sheridan,et al.  Comparison of Topological, Shape, and Docking Methods in Virtual Screening , 2007, J. Chem. Inf. Model..

[277]  K. Héberger,et al.  Multivariate assessment of lipophilicity scales-computational and reversed phase thin-layer chromatographic indices. , 2016, Journal of pharmaceutical and biomedical analysis.

[278]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[279]  M. Randic Characterization of molecular branching , 1975 .

[280]  Mika A. Kastenholz,et al.  GRID/CPCA: a new computational tool to design selective ligands. , 2000, Journal of medicinal chemistry.

[281]  Marina Lasagni,et al.  New molecular descriptors for 2D and 3D structures. Theory , 1994 .

[282]  Chris de Graaf,et al.  KLIFS: a structural kinase-ligand interaction database , 2015, Nucleic Acids Res..

[283]  V. Tewari,et al.  Calculation of heat of formation :- Molecular connectivity and IOC-ω technique, a comparative study , 1984 .

[284]  Peter Willett,et al.  Knowledge-Based Interaction Fingerprint Scoring: A Simple Method for Improving the Effectiveness of Fast Scoring Functions , 2006, J. Chem. Inf. Model..

[285]  Vladimir Poroikov,et al.  PASS: prediction of activity spectra for biologically active substances , 2000, Bioinform..

[286]  Roberto Todeschini,et al.  Structure/Response Correlations and Similarity/Diversity Analysis by GETAWAY Descriptors, 1. Theory of the Novel 3D Molecular Descriptors , 2002, J. Chem. Inf. Comput. Sci..

[287]  Robert D. Clark,et al.  SYBYL Line Notation (SLN): A Single Notation To Represent Chemical Structures, Queries, Reactions, and Virtual Libraries , 2008, J. Chem. Inf. Model..

[288]  Philip S. Yu,et al.  Substructure similarity search in graph databases , 2005, SIGMOD '05.

[289]  J. Bajorath,et al.  Chemoinformatics: a view of the field and current trends in method development. , 2012, Bioorganic & medicinal chemistry.

[290]  K. M. Smith,et al.  Novel software tools for chemical diversity , 1998 .

[291]  Lazaros Mavridis,et al.  Comprehensive Comparison of Ligand-Based Virtual Screening Tools Against the DUD Data set Reveals Limitations of Current 3D Methods , 2010, J. Chem. Inf. Model..

[292]  P. Jurs,et al.  Molecular shape and the prediction of high-performance liquid chromatographic retention indexes of polycyclic aromatic hydrocarbons. , 1987, Analytical chemistry.

[293]  Yanli Wang,et al.  PubChem: Integrated Platform of Small Molecules and Biological Activities , 2008 .

[294]  J. Bajorath,et al.  Scaffold hopping using two-dimensional fingerprints: true potential, black magic, or a hopeless endeavor? Guidelines for virtual screening. , 2010, Journal of medicinal chemistry.

[295]  G. S. Gill,et al.  Molecular surface point environments for virtual screening and the elucidation of binding patterns (MOLPRINT) , 2004 .

[296]  Michael H Abraham,et al.  Fast calculation of van der Waals volume as a sum of atomic and bond contributions and its application to drug compounds. , 2003, The Journal of organic chemistry.

[297]  I. Vidavsky,et al.  Comparing similar spectra: From similarity index to spectral contrast angle , 2002, Journal of the American Society for Mass Spectrometry.

[298]  P. Willett,et al.  A Fast Algorithm For Selecting Sets Of Dissimilar Molecules From Large Chemical Databases , 1995 .

[299]  B. Fan,et al.  Molecular similarity and diversity in chemoinformatics: From theory to applications , 2006, Molecular Diversity.

[300]  Arthur Dalby,et al.  Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited , 1992, J. Chem. Inf. Comput. Sci..

[301]  Xian Jin,et al.  Stereoselective virtual screening of the ZINC database using atom pair 3D-fingerprints , 2015, Journal of Cheminformatics.

[302]  J. Sangshetti,et al.  Recent advances in multidimensional QSAR (4D-6D): a critical review. , 2014, Mini reviews in medicinal chemistry.

[303]  Robert P Sheridan,et al.  Why do we need so many chemical similarity search methods? , 2002, Drug discovery today.

[304]  Wei Deng,et al.  Deconvoluting complex patent Markush structures: A novel R-group numbering system , 2012 .

[305]  Jean-Louis Reymond,et al.  Atom Pair 2D-Fingerprints Perceive 3D-Molecular Shape and Pharmacophores for Very Fast Virtual Screening of ZINC and GDB-17 , 2014, J. Chem. Inf. Model..

[306]  C. Steinbeck,et al.  Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. , 2006, Current pharmaceutical design.

[307]  Paolo Massimo Buscema,et al.  Similarity Coefficients for Binary Chemoinformatics Data: Overview and Extended Comparison Using Simulated and Real Data Sets , 2012, J. Chem. Inf. Model..

[308]  D. Hardman,et al.  Reaction of the subunit of the Escherichia coli tryptophan synthetase with 1,5-difluoro-2,4-dinitrobenzene. , 1971, The Journal of biological chemistry.

[309]  Jochen Bauer,et al.  H2rs: Deducing evolutionary and functionally important residue positions by means of an entropy and similarity based analysis of multiple sequence alignments , 2014, BMC Bioinformatics.

[310]  Egon L. Willighagen,et al.  Chemical Markup, XML, and the World Wide Web. 5. Applications of Chemical Metadata in RSS Aggregators , 2004, J. Chem. Inf. Model..

[311]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[312]  J. Devillers,et al.  New Trends in Structure‐Biodegradability Relationships , 1993 .

[313]  Michal Daszykowski,et al.  Clustering in analytical chemistry. , 2014, Journal of AOAC International.

[314]  Frederick P. Roth,et al.  Chemical substructures that enrich for biological activity , 2008, Bioinform..

[315]  Lorenz C. Blum,et al.  Classification of Organic Molecules by Molecular Quantum Numbers , 2009, ChemMedChem.

[316]  A. Bondi van der Waals Volumes and Radii , 1964 .

[317]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[318]  Daniel M. Lowe,et al.  Development of a Novel Fingerprint for Chemical Reactions and Its Application to Large-Scale Reaction Classification and Similarity , 2015, J. Chem. Inf. Model..

[319]  Miin-Shen Yang,et al.  Similarity measures of intuitionistic fuzzy sets based on Lp metric , 2007, Int. J. Approx. Reason..

[320]  Remco M. Dijkman,et al.  Similarity of business process models: Metrics and evaluation , 2011, Inf. Syst..

[321]  Peter Willett,et al.  Evaluation of Similarity Measures for Searching the Dictionary of Natural Products Database , 2003, J. Chem. Inf. Comput. Sci..

[322]  Jerzy Leszczynski,et al.  Predicting water solubility of congeners: chloronaphthalenes--a case study. , 2009, Journal of hazardous materials.

[323]  Márcia M. C. Ferreira,et al.  Four-Dimensional Structure-Activity Relationship Model to Predict HIV-1 Integrase Strand Transfer Inhibition using LQTA-QSAR Methodology , 2012, J. Chem. Inf. Model..

[324]  Feng Gan,et al.  A spectral similarity measure using Bayesian statistics. , 2009, Analytica chimica acta.

[325]  Alfonso Valencia,et al.  CheNER: chemical named entity recognizer , 2014, Bioinform..

[326]  Y. Roggo,et al.  A review of near infrared spectroscopy and chemometrics in pharmaceutical technologies. , 2007, Journal of pharmaceutical and biomedical analysis.

[327]  Jürgen Bajorath,et al.  Fingerprint Scaling Increases the Probability of Identifying Molecules with Similar Activity in Virtual Screening Calculations , 2001, J. Chem. Inf. Comput. Sci..

[328]  A. Bender,et al.  In silico target fishing: Predicting biological targets from chemical structure , 2006 .

[329]  Gabriele Cruciani,et al.  A Common Reference Framework for Analyzing/Comparing Proteins and Ligands. Fingerprints for Ligands And Proteins (FLAP): Theory and Application , 2007, J. Chem. Inf. Model..

[330]  Igor V. Filippov,et al.  Optical Structure Recognition Software To Recover Chemical Information: OSRA, An Open Source Solution , 2009, J. Chem. Inf. Model..

[331]  P. Hawkins,et al.  Comparison of shape-matching and docking as virtual screening tools. , 2007, Journal of medicinal chemistry.

[332]  Neera Jain,et al.  Estimation of Aqueous Solubility By The General Solubility Equation (GSE) The Easy Way , 2003 .

[333]  H. Wiener Structural determination of paraffin boiling points. , 1947, Journal of the American Chemical Society.

[334]  Georgi K. Kanev,et al.  PDEStrIAn: A Phosphodiesterase Structure and Ligand Interaction Annotated Database As a Tool for Structure-Based Drug Design. , 2016, Journal of medicinal chemistry.

[335]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.