Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information

The Online Chemical Modeling Environment is a web-based platform that aims to automate and simplify the typical steps required for QSAR modeling. The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. A user-contributed database contains a set of tools for easy input, search and modification of thousands of records. The OCHEM database is based on the wiki principle and focuses primarily on the quality and verifiability of the data. The database is tightly integrated with the modeling framework, which supports all the steps required to create a predictive model: data search, calculation and selection of a vast variety of molecular descriptors, application of machine learning methods, validation, analysis of the model and assessment of the applicability domain. As compared to other similar systems, OCHEM is not intended to re-implement the existing tools or models but rather to invite the original authors to contribute their results, make them publicly available, share them with other users and to become members of the growing research community. Our intention is to make OCHEM a widely used platform to perform the QSPR/QSAR studies online and share it with other users on the Web. The ultimate goal of OCHEM is collecting all possible chemoinformatics tools within one simple, reliable and user-friendly resource. The OCHEM is free for web users and it is available online at http://www.ochem.eu.

[1]  Peter Ertl,et al.  Molecular structure input on the web , 2010, J. Cheminformatics.

[2]  Igor V. Tetko,et al.  Estimation of Aqueous Solubility of Chemical Compounds Using E-State Indices , 2001, J. Chem. Inf. Comput. Sci..

[3]  R. Mannhold,et al.  Calculation of molecular lipophilicity: state of the art and comparison of methods on more than 96000 compounds , 2009, Journal of pharmaceutical sciences.

[4]  Vladimir Potemkin,et al.  Technique for Energy Decomposition in the Study of "Receptor-Ligand" Complexes , 2009, J. Chem. Inf. Model..

[5]  Sean Ekins,et al.  Shape signatures: new descriptors for predicting cardiotoxicity in silico. , 2008, Chemical research in toxicology.

[6]  Lemont B. Kier,et al.  Molecular Similarity Based on Novel Atom-Type Electrotopological State Indices , 1995, J. Chem. Inf. Comput. Sci..

[7]  Igor I. Baskin,et al.  Fragmental descriptors with labeled atoms and their application in QSAR/QSPR studies , 2007 .

[8]  Johann Gasteiger,et al.  New Description of Molecular Chirality and Its Application to the Prediction of the Preferred Enantiomer in Stereoselective Reactions , 2001, J. Chem. Inf. Comput. Sci..

[9]  Egon L. Willighagen,et al.  The Blue Obelisk—Interoperability in Chemical Informatics , 2006, J. Chem. Inf. Model..

[10]  Igor V. Tetko,et al.  Neural Network Studies, 4. Introduction to Associative Neural Networks , 2002, J. Chem. Inf. Comput. Sci..

[11]  Vladimir Potemkin,et al.  Genetic Algorithm for Predicting Structures and Properties of Molecular Aggregates in Organic Substances , 2002 .

[12]  M. Cronin,et al.  Pitfalls in QSAR , 2003 .

[13]  William J Welsh,et al.  Shape Signatures: speeding up computer aided drug discovery. , 2006, Drug discovery today.

[14]  Igor I. Baskin,et al.  Chemical graphs and their basis invariants , 1999 .

[15]  S. Ekins,et al.  Predicting Inhibitors of Acetylcholinesterase by Regression and Classification Machine Learning Approaches with Combinations of Molecular Descriptors , 2009, Pharmaceutical Research.

[16]  Raimund Mannhold,et al.  Large‐Scale Evaluation of log P Predictors: Local Corrections May Compensate Insufficient Accuracy and Need of Experimentally Testing Every Other Compound , 2009, Chemistry & biodiversity.

[17]  G. Poda,et al.  Application of ALOGPS 2.1 to predict log D distribution coefficient for Pfizer proprietary compounds. , 2004, Journal of medicinal chemistry.

[18]  Igor V Tetko,et al.  Computing chemistry on the web. , 2005, Drug discovery today.

[19]  Jocelyn Kaiser,et al.  Science resources. Chemists want NIH to curtail database. , 2005, Science.

[20]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[21]  R. Wade,et al.  Prediction of drug binding affinities by comparative binding energy analysis. , 1997, Journal of medicinal chemistry.

[22]  Igor V. Tetko,et al.  Virtual Computational Chemistry Laboratory – Design and Description , 2005, J. Comput. Aided Mol. Des..

[23]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[24]  A. Bender,et al.  Circular fingerprints: flexible molecular descriptors with applications from physical chemistry to ADME. , 2006, IDrugs : the investigational drugs journal.

[25]  Alan Mcnaught,et al.  The IUPAC international chemical identifier : InChl-A new standard for molecular informatics , 2006 .

[26]  Vladimir Potemkin,et al.  A Method for Multiconformational Modeling of the Three‐Dimensional Shape of a Molecule , 2002 .

[27]  Johann Gasteiger,et al.  Of molecules and humans. , 2006, Journal of medicinal chemistry.

[28]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[29]  Vladimir Potemkin,et al.  A new paradigm for pattern recognition of drugs , 2008, J. Comput. Aided Mol. Des..

[30]  Roberto Todeschini,et al.  Molecular descriptors for chemoinformatics , 2009 .

[31]  Artem Cherkasov,et al.  An updated steroid benchmark set and its application in the discovery of novel nanomolar ligands of sex hormone-binding globulin. , 2008, Journal of medicinal chemistry.

[32]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[33]  Igor V. Tetko,et al.  Data modelling with neural networks: Advantages and limitations , 1997, J. Comput. Aided Mol. Des..

[34]  P. Selzer,et al.  Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. , 2000, Journal of medicinal chemistry.

[35]  H. Mewes,et al.  Can we estimate the accuracy of ADME-Tox predictions? , 2006, Drug discovery today.

[36]  R. Todeschini,et al.  Molecular Descriptors for Chemoinformatics: Volume I: Alphabetical Listing / Volume II: Appendices, References , 2009 .

[37]  Qing-You Zhang,et al.  Genome-scale classification of metabolic reactions and assignment of EC numbers with self-organizing maps , 2008, Bioinform..

[38]  Antony J Williams,et al.  Internet-based tools for communication and collaboration in chemistry. , 2008, Drug discovery today.

[39]  Igor V. Tetko,et al.  Associative Neural Network , 2002, Neural Processing Letters.

[40]  Gregg D. Wilensky,et al.  Neural Network Studies , 1993 .

[41]  Filipe Aires,et al.  Neural Network Uncertainty Assessment Using Bayesian Statistics: A Remote Sensing Application , 2004, Neural Computation.

[42]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[43]  N. Bodor,et al.  Neural network studies: Part 3. Prediction of partition coefficients , 1994 .

[44]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[45]  I. Tetko,et al.  Application of ALOGPS to predict 1-octanol/water distribution coefficients, logP, and logD, of AstraZeneca in-house database. , 2004, Journal of pharmaceutical sciences.

[46]  S. Hirono,et al.  Simple Method of Calculating Octanol/Water Partition Coefficient. , 1992 .

[47]  E. LaVoie,et al.  Bioisosterism: A Rational Approach in Drug Design. , 1996, Chemical reviews.

[48]  Igor V. Tetko,et al.  Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information , 2011, J. Cheminformatics.

[49]  I. Tetko,et al.  In silico approaches to prediction of aqueous and DMSO solubility of drug-like compounds: trends, problems and solutions. , 2006, Current medicinal chemistry.

[50]  Andreas Bender,et al.  Molecular Similarity Searching Using Atom Environments, Information-Based Feature Selection, and a Naïve Bayesian Classifier , 2004, J. Chem. Inf. Model..

[51]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[52]  Andreas Bender,et al.  How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space , 2009, J. Chem. Inf. Model..

[53]  Yanli Wang,et al.  PubChem: Integrated Platform of Small Molecules and Biological Activities , 2008 .

[54]  Lorentz Jäntschi,et al.  DIAGNOSTIC OF A QSPR MODEL: AQUEOUS SOLUBILITY OF DRUG-LIKE COMPOUNDS , 2010 .

[55]  Igor V. Tetko,et al.  Benchmarking of Linear and Nonlinear Approaches for Quantitative Structure-Property Relationship Studies of Metal Complexation with Ionophores , 2006, J. Chem. Inf. Model..

[56]  Igor V. Tetko,et al.  Inductive Transfer of Knowledge: Application of Multi-Task Learning and Feature Net Approaches to Model Tissue-Air Partition Coefficients , 2009, J. Chem. Inf. Model..

[57]  Igor V. Tetko,et al.  Prediction of n-Octanol/Water Partition Coefficients from PHYSPROP Database Using Artificial Neural Networks and E-State Indices , 2001, J. Chem. Inf. Comput. Sci..

[58]  Igor V. Tetko,et al.  Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection , 2008, J. Chem. Inf. Model..

[59]  Woody Sherman,et al.  Large-Scale Systematic Analysis of 2D Fingerprint Methods and Parameters to Improve Virtual Screening Enrichments , 2010, J. Chem. Inf. Model..

[60]  Johann Gasteiger,et al.  Prediction of enantiomeric selectivity in chromatography. Application of conformation-dependent and conformation-independent descriptors of molecular chirality. , 2002, Journal of molecular graphics & modelling.

[61]  Igor V Tetko,et al.  Estimation of Acid Dissociation Constants Using Graph Kernels , 2010, Molecular informatics.

[62]  Qing-You Zhang,et al.  Physicochemical Stereodescriptors of Atomic Chiral Centers , 2006, J. Chem. Inf. Model..

[63]  Igor V. Tetko,et al.  Application of Associative Neural Networks for Prediction of Lipophilicity in ALOGPS 2.1 Program , 2002, J. Chem. Inf. Comput. Sci..

[64]  Igor V. Tetko,et al.  Electronic‐Topological Investigation of theStructure – Acetylcholinesterase Inhibitor Activity Relationship in the Series of N‐Benzylpiperidine Derivatives , 2001 .

[65]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[66]  I. Tetko,et al.  ISIDA - Platform for Virtual Screening Based on Fragment and Pharmacophoric Descriptors , 2008 .

[67]  I. Tetko,et al.  Applicability domain for in silico models to achieve accuracy of experimental measurements , 2010 .

[68]  Gerhard Klebe,et al.  Comparison of Automatic Three-Dimensional Model Builders Using 639 X-ray Structures , 1994, J. Chem. Inf. Comput. Sci..

[69]  Alexander Tropsha,et al.  Chembench: a cheminformatics workbench , 2010, Bioinform..

[70]  Johann Gasteiger,et al.  Prediction of enantiomeric excess in a combinatorial library of catalytic enantioselective reactions. , 2005, Journal of combinatorial chemistry.

[71]  C. Steinbeck,et al.  Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. , 2006, Current pharmaceutical design.

[72]  William J. Welsh,et al.  Application of Screening Methods, Shape Signatures and Engineered Biosensors in Early Drug Discovery Process , 2009, Pharmaceutical Research.

[73]  Gisbert Schneider,et al.  Kernel Approach to Molecular Similarity Based on Iterative Graph Similarity , 2007, J. Chem. Inf. Model..

[74]  Igor V. Tetko,et al.  Rule-Based Systems to Predict Lipophilicity , 2007 .

[75]  Vladimir Potemkin,et al.  A NEW APPROACH TO PREDICTING THE THERMODYNAMIC PARAMETERS OF SUBSTANCES FROM MOLECULAR CHARACTERISTICS , 1996 .

[76]  Guillermo Moyna,et al.  Shape signatures: a new approach to computer-aided ligand- and receptor-based drug design. , 2003, Journal of medicinal chemistry.

[77]  Igor V Tetko,et al.  A comparison of different QSAR approaches to modeling CYP450 1A2 inhibition , 2011, J. Chem. Inf. Model..

[78]  J. Stewart Optimization of parameters for semiempirical methods I. Method , 1989 .