ChemBCPP: A freely available web server for calculating commonly used physicochemical properties

Abstract The behavior of a chemical in human or environment mostly depends on its several key physicochemical properties, such as aqueous solubility, octanol-water partition coefficient (logP), boiling point (BP), density, flash point (FP), viscosity, surface tension (ST), vapor pressure (VP) and melting point (MP). Commonly, these properties are important for the environmental sciences and drug discovery, such as the absorption, distribution, metabolism, excretion, and toxicity (ADMET) for medicinal compounds and the common risk assessment for problematic chemicals. At present, the quantitative structure-property relationship (QSPR) model was widely applied to save time and money investment in the early stage of chemical research. Although some satisfactory models were already obtained, most of them are not available for the public researchers and thus cannot be directly applied to practical research projects. Herein, in this study, we developed a user-friendly web server named ChemBCPP that can be used to predict aforementioned 8 important physicochemical properties and calculate several other commonly used properties just by uploading a molecular structure or file. In addition, for a new chemical entity, users can not only get its predicted value but also obtain a leverage value (h value) which can be used to evaluate the reliability of predictive result. We believe that ChemBCPP could be widely applied in environmental science, chemical synthesis and drug ADMET fields with the demand for high quality of chemical properties. ChemBCPP could be freely available via http://chembcpp.scbdd.com .

[1]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[2]  Jue-hong Zhang,et al.  QSPR study for prediction of boiling points of 2475 organic compounds using stochastic gradient boosting , 2014 .

[3]  Igor V. Tetko,et al.  CADASTER QSPR Models for Predictions of Melting and Boiling Points of Perfluorinated Chemicals , 2011, Molecular informatics.

[4]  J. J. Jasper,et al.  The Surface Tension of Pure Liquid Compounds , 1972 .

[5]  James Vail,et al.  The exposure data landscape for manufactured chemicals. , 2012, The Science of the total environment.

[6]  Neera Jain,et al.  Prediction of Aqueous Solubility of Organic Compounds by the General Solubility Equation (GSE) , 2001, J. Chem. Inf. Comput. Sci..

[7]  Florian Nigsch,et al.  Why Are Some Properties More Difficult To Predict than Others? A Study of QSPR Models of Solubility, Melting Point, and Log P , 2008, J. Chem. Inf. Model..

[8]  D. Viswanath,et al.  Data book on the viscosity of liquids , 1989 .

[9]  A. Talevi,et al.  Prediction of drug intestinal absorption by new linear and non-linear QSPR. , 2011, European journal of medicinal chemistry.

[10]  Jarmo Huuskonen,et al.  Estimation of Aqueous Solubility for a Diverse Set of Organic Compounds Based on Molecular Topology , 2000, J. Chem. Inf. Comput. Sci..

[11]  Xiaojun Yao,et al.  A Novel Strategy of Structural Similarity Based Consensus Modeling , 2013, Molecular informatics.

[12]  Greet Schoeters,et al.  The Reach Perspective: Toward a New Concept of Toxicity Testing , 2010, Journal of toxicology and environmental health. Part B, Critical reviews.

[13]  Peter Ertl,et al.  JSME: a free molecule editor in JavaScript , 2013, Journal of Cheminformatics.

[14]  Ann Richard,et al.  Advancing Exposure Characterization for Chemical Evaluation and Risk Assessment , 2010, Journal of toxicology and environmental health. Part B, Critical reviews.

[15]  Cikui Liang,et al.  QSPR Prediction of Vapor Pressure from Solely Theoretically-Derived Descriptors , 1998, J. Chem. Inf. Comput. Sci..

[16]  Evan Bolton,et al.  The PubChem chemical structure sketcher , 2009, J. Cheminformatics.

[17]  Yizeng Liang,et al.  Exploring nonlinear relationships in chemical data using kernel-based methods , 2011 .

[18]  Dong-Sheng Cao,et al.  Support Vector Machines and Their Application in Chemistry and Biotechnology , 2011 .

[19]  Ann Richard,et al.  ACToR--Aggregated Computational Toxicology Resource. , 2008, Toxicology and applied pharmacology.

[20]  Yizeng Liang,et al.  Tree-based ensemble methods and their applications in analytical chemistry , 2012 .

[21]  Frank R. Burden,et al.  Predictive Human Intestinal Absorption QSAR Models Using Bayesian Regularized Neural Networks , 2005 .

[22]  Dong-Sheng Cao,et al.  A new strategy of outlier detection for QSAR/QSPR , 2009, J. Comput. Chem..

[23]  Judith C. Madden,et al.  In Silico Prediction of Aqueous Solubility: The Solubility Challenge , 2009, J. Chem. Inf. Model..

[24]  Junmei Wang,et al.  Recent advances on aqueous solubility prediction. , 2011, Combinatorial chemistry & high throughput screening.

[25]  Dong-Sheng Cao,et al.  ChemDes: an integrated web-based platform for molecular descriptor and fingerprint computation , 2015, Journal of Cheminformatics.

[26]  D-S Cao,et al.  In silico toxicity prediction by support vector machine and SMILES representation-based string kernel , 2012, SAR and QSAR in environmental research.

[27]  Dong-Sheng Cao,et al.  ChemoPy: freely available python package for computational biology and chemoinformatics , 2013, Bioinform..

[28]  Emilio Benfenati,et al.  Integrating in silico models to enhance predictivity for developmental toxicity. , 2016, Toxicology.

[29]  Paola Gramatica,et al.  The QSPR-THESAURUS: The Online Platform of the CADASTER Project , 2014, Alternatives to laboratory animals : ATLA.

[30]  Hilda Witters,et al.  A European perspective on alternatives to animal testing for environmental hazard identification and risk assessment. , 2013, Regulatory toxicology and pharmacology : RTP.

[31]  U Sahlin,et al.  Applicability Domain Dependent Predictive Uncertainty in QSAR Regressions , 2014, Molecular informatics.

[32]  David M. Reif,et al.  Aggregating Data for Computational Toxicology Applications: The U.S. Environmental Protection Agency (EPA) Aggregated Computational Toxicology Resource (ACToR) System , 2012, International journal of molecular sciences.

[33]  Dong-Sheng Cao,et al.  BioTriangle: a web-accessible platform for generating various molecular representations for chemicals, proteins, DNAs/RNAs and their interactions , 2016, Journal of Cheminformatics.

[34]  Igor V. Tetko,et al.  Virtual Computational Chemistry Laboratory – Design and Description , 2005, J. Comput. Aided Mol. Des..

[35]  Dong-Sheng Cao,et al.  Prediction of aqueous solubility of druglike organic compounds using partial least squares, back‐propagation network and support vector machine , 2010 .

[36]  Dong-Sheng Cao,et al.  The model adaptive space shrinkage (MASS) approach: a new method for simultaneous variable selection and outlier detection based on model population analysis. , 2016, The Analyst.

[37]  J. Hermens,et al.  The octanol–water partition coefficient: Strengths and limitations , 2013, Environmental toxicology and chemistry.

[38]  Arnold Weissberger,et al.  Organic solvents;: Physical properties and methods of purification , 1970 .

[39]  Dong-Sheng Cao,et al.  Automatic feature subset selection for decision tree-based ensemble methods in the prediction of bioactivity , 2010 .

[40]  David M. Reif,et al.  Activity profiles of 309 ToxCast™ chemicals evaluated across 292 biochemical targets. , 2011, Toxicology.

[41]  Roberto Todeschini,et al.  Comparison of Different Approaches to Define the Applicability Domain of QSAR Models , 2012, Molecules.

[42]  Dong-Sheng Cao,et al.  PyDPI: Freely Available Python Package for Chemoinformatics, Bioinformatics, and Chemogenomics Studies , 2013, J. Chem. Inf. Model..

[43]  Ruili Huang,et al.  CERAPP: Collaborative Estrogen Receptor Activity Prediction Project , 2016, Environmental health perspectives.

[44]  Gilles Klopman,et al.  ADME evaluation. 2. A computer model for the prediction of intestinal absorption in humans. , 2002, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[45]  Dong-Sheng Cao,et al.  Ensemble partial least squares regression for descriptor selection, outlier detection, applicability domain assessment, and ensemble modeling in QSAR/QSPR modeling , 2017 .

[46]  Scott D. Kahn,et al.  Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity Relationships , 2005, Alternatives to laboratory animals : ATLA.

[47]  Dong-Sheng Cao,et al.  Predicting human intestinal absorption with modified random forest approach: a comprehensive evaluation of molecular representation, unbalanced data, and applicability domain issues , 2017 .

[48]  S. Yalkowsky,et al.  Estimation of aqueous solubility of organic compounds by using the general solubility equation. , 2002, Chemosphere.

[49]  Dong-Sheng Cao,et al.  ChemSAR: an online pipelining platform for molecular SAR modeling , 2017, Journal of Cheminformatics.

[50]  John C Dearden,et al.  Prediction of physicochemical properties. , 2012, Methods in molecular biology.

[51]  Dong-Sheng Cao,et al.  ADME Properties Evaluation in Drug Discovery: Prediction of Caco-2 Cell Permeability Using a Combination of NSGA-II and Boosting , 2016, J. Chem. Inf. Model..

[52]  Tingjun Hou,et al.  ADMET evaluation in drug discovery: 15. Accurate prediction of rat oral acute toxicity using relevance vector machine and consensus modeling , 2016, Journal of Cheminformatics.

[53]  Dong-Sheng Cao,et al.  The boosting: A new idea of building models , 2010 .

[54]  Dong-Sheng Cao,et al.  In silico toxicity prediction of chemicals from EPA toxicity database by kernel fusion-based support vector machines , 2015 .