Recent advances in chemoinformatics.

Chemoinformatics is a large scientific discipline that deals with the storage, organization, management, retrieval, analysis, dissemination, visualization, and use of chemical information. Chemoinformatics techniques are used extensively in drug discovery and development. Although many consider it a mature field, the advent of high-throughput experimental techniques and the need to analyze very large data sets have brought new life and challenges to it. Here, we review a selection of papers published in 2006 that caught our attention with regard to the novelty of the methodology that was presented. The field is seeing significant growth, which will be further catalyzed by the widespread availability of public databases to support the development and validation of new approaches.

[1]  Ajay N. Jain,et al.  Robust ligand-based modeling of the biological targets of known drugs. , 2006, Journal of medicinal chemistry.

[2]  B Tidor,et al.  Charge optimization leads to favorable electrostatic binding free energy. , 1999, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[3]  Thierry Langer,et al.  Comparative Analysis of Protein-Bound Ligand Conformations with Respect to Catalyst's Conformational Space Subsampling Algorithms , 2005, J. Chem. Inf. Model..

[4]  Kenneth M. Merz,et al.  Can we separate active from inactive conformations? , 2002, J. Comput. Aided Mol. Des..

[5]  Harvey J. Greenberg,et al.  Opportunities for Combinatorial Optimization in Computational Biology , 2004, INFORMS J. Comput..

[6]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[7]  Jun Feng,et al.  PharmID: Pharmacophore Identification Using Gibbs Sampling , 2006, J. Chem. Inf. Model..

[8]  Xiaoqin Zou,et al.  An iterative knowledge‐based scoring function to predict protein–ligand interactions: II. Validation of the scoring function , 2006, J. Comput. Chem..

[9]  P. Hajduk Fragment-based drug design: how big is too big? , 2006, Journal of medicinal chemistry.

[10]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[11]  Eric J. Martin,et al.  Conformational Sampling of Bioactive Molecules: A Comparative Study , 2007, J. Chem. Inf. Model..

[12]  Ashwin Srinivasan,et al.  Warmr: a data mining tool for chemical data , 2001, J. Comput. Aided Mol. Des..

[13]  R. Biswas,et al.  Metagraph-Based Substructure Pattern Mining , 2008, 2008 International Conference on Advanced Computer Theory and Engineering.

[14]  Hans J. Wolters,et al.  Geometric modeling applications in rational drug design: a survey , 2006, Comput. Aided Geom. Des..

[15]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[16]  Andreas Zell,et al.  Optimal assignment kernels for attributed molecular graphs , 2005, ICML.

[17]  N. Nikolova,et al.  International Union of Pure and Applied Chemistry, LUMO energy ± The Lowest Unoccupied Molecular Orbital (LUMO) , 2022 .

[18]  W. C. Still,et al.  Semianalytical treatment of solvation for molecular mechanics and dynamics , 1990 .

[19]  Johannes Fürnkranz,et al.  Incremental Reduced Error Pruning , 1994, ICML.

[20]  John P. Overington,et al.  How many drug targets are there? , 2006, Nature Reviews Drug Discovery.

[21]  G. Klopman MULTICASE 1. A Hierarchical Computer Automated Structure Evaluation Program , 1992 .

[22]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[23]  G. V. Paolini,et al.  Global mapping of pharmacological space , 2006, Nature Biotechnology.

[24]  Jean-Philippe Vert,et al.  The Pharmacophore Kernel for Virtual Screening with Support Vector Machines , 2006, J. Chem. Inf. Model..

[25]  Haim J. Wolfson,et al.  Model-Based Object Recognition by Geometric Hashing , 1990, ECCV.

[26]  Gavin Harper,et al.  Training Similarity Measures for Specific Activities: Application to Reduced Graphs , 2006, J. Chem. Inf. Model..

[27]  Atsuyuki Okabe,et al.  Spatial Tessellations: Concepts and Applications of Voronoi Diagrams , 1992, Wiley Series in Probability and Mathematical Statistics.

[28]  Linus Pauling,et al.  THE NATURE OF THE CHEMICAL BOND. IV. THE ENERGY OF SINGLE BONDS AND THE RELATIVE ELECTRONEGATIVITY OF ATOMS , 1932 .

[29]  Joost N. Kok,et al.  Frequent graph mining and its application to molecular databases , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[30]  Luc De Raedt,et al.  SMIREP: Predicting Chemical Activity from SMILES , 2006, J. Chem. Inf. Model..

[31]  Valerie J. Gillet,et al.  SPROUT, HIPPO and CAESA: Tools for de novo structure generation and estimation of synthetic accessibility , 1995 .

[32]  Tatsuya Akutsu,et al.  On the approximation of largest common subtrees and largest common point sets , 2000, Theor. Comput. Sci..

[33]  Johann Gasteiger,et al.  Deriving the 3D structure of organic molecules from their infrared spectra , 1999 .

[34]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[35]  M. Congreve,et al.  Fragment-based lead discovery , 2004, Nature Reviews Drug Discovery.

[36]  Ruth Nussinov,et al.  The Multiple Common Point Set Problem and Its Application to Molecule Binding Pattern Detection , 2006, J. Comput. Biol..

[37]  Marina L. Gavrilova,et al.  An algorithm for three‐dimensional Voronoi S‐network , 2006, J. Comput. Chem..

[38]  Henk Vandecasteele,et al.  Discovering H-bonding rules in crystals with inductive logic programming. , 2006, Molecular pharmaceutics.

[39]  Andreas Zell,et al.  Kernel Functions for Attributed Molecular Graphs – A New Similarity‐Based Approach to ADME Prediction in Classification and Regression , 2006 .

[40]  Yun He,et al.  Learning from the Data: Mining of Large High-Throughput Screening Databases , 2006, J. Chem. Inf. Model..

[41]  Robert P. Sheridan,et al.  Molecular Transformations as a Way of Finding and Exploiting Consistent Local QSAR , 2006, J. Chem. Inf. Model..

[42]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[43]  Amanda Clare,et al.  Data Mining the Yeast Genome in a Lazy Functional Language , 2003, PADL.

[44]  Julian Tirado-Rives,et al.  Contribution of conformer focusing to the uncertainty in predicting free energies for protein-ligand binding. , 2006, Journal of medicinal chemistry.

[45]  C. E. Peishoff,et al.  A critical assessment of docking programs and scoring functions. , 2006, Journal of medicinal chemistry.

[46]  M. Vieth,et al.  Kinomics: characterizing the therapeutically validated kinase space. , 2005, Drug discovery today.

[47]  Richard A. Friesner,et al.  What role do surfaces play in GB models? A new‐generation of surface‐generalized born model based on a novel gaussian surface for biomolecules , 2006, J. Comput. Chem..

[48]  B. Stockwell,et al.  Biological mechanism profiling using an annotated compound library. , 2003, Chemistry & biology.

[49]  David G. Lloyd,et al.  Permuting input for more effective sampling of 3D conformer space , 2006, J. Comput. Aided Mol. Des..

[50]  Eric C. Rouchka A Brief Overview of Gibbs Sampling , 2008 .

[51]  Jack Snoeyink,et al.  Almost-Delaunay simplices: nearest neighbor relations for imprecise points , 2004, SODA '04.

[52]  Christian Borgelt,et al.  Discriminative Closed Fragment Mining and Perfect Extensions in MoFa , 2004 .

[53]  SHENG-YOU HUANG,et al.  An iterative knowledge‐based scoring function to predict protein–ligand interactions: I. Derivation of interaction potentials , 2006, J. Comput. Chem..

[54]  A. Tropsha,et al.  Development of quantitative structure-binding affinity relationship models based on novel geometrical chemical descriptors of the protein-ligand interfaces. , 2006, Journal of medicinal chemistry.

[55]  A. Johnson,et al.  Molecular complexity analysis of de novo designed ligands. , 2006, Journal of medicinal chemistry.

[56]  J M Thornton,et al.  LIGPLOT: a program to generate schematic diagrams of protein-ligand interactions. , 1995, Protein engineering.

[57]  Benzhuo Lu,et al.  Order N algorithm for computation of electrostatic interactions in biomolecular systems , 2006, Proceedings of the National Academy of Sciences.

[58]  Regine Bohacek,et al.  Multiple Highly Diverse Structures Complementary to Enzyme Binding Sites: Results of Extensive Application of a de Novo Design Method Incorporating Combinatorial Growth , 1994 .

[59]  Stephan Heyse,et al.  From targets to leads: the importance of advanced data analysis for decision support in drug discovery. , 2005, Current opinion in drug discovery & development.

[60]  Huafeng Xu,et al.  A self-organizing principle for learning nonlinear manifolds , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[61]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[62]  Christian Borgelt,et al.  Mining Fragments with Fuzzy Chains in Molecular Databases , 2004 .

[63]  Jinze Liu,et al.  Structure‐based function inference using protein family‐specific fingerprints , 2006, Protein science : a publication of the Protein Society.

[64]  Zhan Deng,et al.  Knowledge-based design of target-focused libraries using protein-ligand interaction constraints. , 2006, Journal of medicinal chemistry.

[65]  Schmid,et al.  "Scaffold-Hopping" by Topological Pharmacophore Search: A Contribution to Virtual Screening. , 1999, Angewandte Chemie.

[66]  Marina L. Gavrilova,et al.  Implementation of the Voronoi-Delaunay Method for Analysis of Intermolecular Voids , 2004, ICCSA.

[67]  Andrew R. Leach,et al.  A comparison of the pharmacophore identification programs: Catalyst, DISCO and GASP , 2002, J. Comput. Aided Mol. Des..

[68]  Andreas Zell,et al.  Data and Graph Mining in Chemical Space for ADME and Activity Data Sets , 2006 .

[69]  Adam Yasgar,et al.  Quantitative high-throughput screening: a titration-based approach that efficiently identifies biological activities in large chemical libraries. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[70]  A. Zell,et al.  Assignment kernels for chemical compounds , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[71]  Christos A. Nicolaou,et al.  Ties in Proximity and Clustering Compounds , 2001, J. Chem. Inf. Comput. Sci..

[72]  Hui Xiong,et al.  Hyperclique pattern discovery , 2006, Data Mining and Knowledge Discovery.

[73]  Stefan Wetzel,et al.  Protein structure similarity clustering: dynamic treatment of PDB structures facilitates clustering. , 2006, Angewandte Chemie.

[74]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[75]  Takashi Washio,et al.  Applying the Apriori-based Graph Mining Method to Mutagenesis Data Analysis , 2001 .

[76]  Joseph S. Verducci,et al.  On Combining Recursive Partitioning and Simulated Annealing To Detect Groups of Biologically Active Compounds , 2002, J. Chem. Inf. Comput. Sci..

[77]  Garland R. Marshall,et al.  3D-QSAR of angiotensin-converting enzyme and thermolysin inhibitors: A comparison of CoMFA models based on deduced and experimentally determined active site geometries , 1993 .

[78]  Samarjit Chakraborty,et al.  Computing Largest Common Point Sets under Approximate Congruence , 2000, ESA.

[79]  Matthias Rarey,et al.  Feature trees: A new molecular similarity measure based on tree matching , 1998, J. Comput. Aided Mol. Des..

[80]  Stefan Wetzel,et al.  The Scaffold Tree - Visualization of the Scaffold Universe by Hierarchical Scaffold Classification , 2007, J. Chem. Inf. Model..

[81]  William E. Lorensen,et al.  Marching cubes: a high resolution 3D surface construction algorithm , 1996 .

[82]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[83]  Zhan Deng,et al.  Interaction profiles of protein kinase-inhibitor complexes and their application to virtual screening. , 2005, Journal of medicinal chemistry.

[84]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[85]  Z. Deng,et al.  Structural interaction fingerprint (SIFt): a novel method for analyzing three-dimensional protein-ligand binding interactions. , 2004, Journal of medicinal chemistry.

[86]  Hannu Toivonen,et al.  Finding Frequent Substructures in Chemical Compounds , 1998, KDD.

[87]  Lydia E Kavraki,et al.  Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction , 2006, Proc. Natl. Acad. Sci. USA.

[88]  Mihaly Mezei,et al.  Morphology of Voids in Molecular Systems. A Voronoi−Delaunay Analysis of a Simulated DMPC Membrane , 2004 .

[89]  Dimitris K. Agrafiotis,et al.  A distance geometry heuristic for expanding the range of geometries sampled during conformational search , 2006, J. Comput. Chem..

[90]  A. Hopkins,et al.  The druggable genome , 2002, Nature Reviews Drug Discovery.

[91]  Luc De Raedt,et al.  Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds , 2004, J. Chem. Inf. Model..

[92]  Claudio Chuaqui,et al.  Structural Interaction Fingerprints: A New Approach to Organizing, Mining, Analyzing, and Designing Protein–Small Molecule Complexes , 2006, Chemical biology & drug design.

[93]  Thorsten Meinl,et al.  Hybrid fragment mining with MoFa and FSG , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[94]  Monya Baker,et al.  Open-access chemistry databases evolving slowly but not surely , 2006, Nature Reviews Drug Discovery.

[95]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[96]  Alexander Tropsha,et al.  Simplicial neighborhood analysis of protein packing (SNAPP): a computational geometry approach to studying proteins. , 2003, Methods in enzymology.

[97]  Paolo Mazzatorta,et al.  Integration of Structure-Activity Relationship and Artificial Intelligence Systems To Improve in Silico Prediction of Ames Test Mutagenicity , 2007, J. Chem. Inf. Model..

[98]  Roger A. Sayle,et al.  Electrostatic evaluation of isosteric analogues , 2006, J. Comput. Aided Mol. Des..

[99]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[100]  Thomas Bäck,et al.  Mining a Chemical Database for Fragment Co-occurrence: Discovery of "Chemical Clichés" , 2006, J. Chem. Inf. Model..