Advantages of support vector machine in QSPR studies for predicting auto-ignition temperatures of organic compounds

Abstract A new method support vector machine (SVM) was first used to develop quantitative structure–property relationship (QSPR) models for predicting auto-ignition temperatures (AIT) of organic compounds. The calibration and predictive ability of SVM were investigated and compared with those of other two common methods, multiple linear regression (MLR) and back-propagation neural network (BPNN). Two different datasets were evaluated. The first one involved a set of 50 alkane compounds, whose structural characteristics were encoded by employing the widely used atom-type E-state indices as molecular descriptors, while the second one dealt with a total of 142 organic compounds, whose molecular structures were described using both the physicochemical parameters and structure descriptors. Both internal and external validations were performed to validate the performance of the resulting models. The results showed that, for both datasets, the calculated AIT values by SVM were in good agreement with the experimental ones, and the performances of the SVM models were comparable or superior to those of MLR and BPNN ones, especially in external predictive ability. This paper provides a new and effective method for predicting AIT of organic compounds, and also reveals that SVM can be used as a powerful chemometrics tool for QSPR studies.

[1]  Peter Lind,et al.  Support Vector Machines for the Estimation of Aqueous Solubility , 2003, J. Chem. Inf. Comput. Sci..

[2]  Zhirong Wang,et al.  Quantitative structure-property relationship studies for predicting flash points of alkanes using group bond contribution method with back-propagation neural network. , 2007, Journal of hazardous materials.

[3]  Edwin Metcalfe,et al.  Optimisation of radial basis and backpropagation neural networks for modelling auto-ignition temperature by quantitative-structure property relationships , 1996 .

[4]  S. Howells,et al.  Optimisation of radial basis function neural networks using biharmonic spline interpolation , 1998 .

[5]  X. Y. Zhang,et al.  Application of support vector machine (SVM) for prediction toxic activity of different data sets. , 2006, Toxicology.

[6]  Lemont B. Kier,et al.  Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information , 1995, J. Chem. Inf. Comput. Sci..

[7]  H. X. Liu,et al.  The prediction of human oral absorption for diffusion rate-limited drugs based on heuristic method and support vector machine , 2005, J. Comput. Aided Mol. Des..

[8]  Ruisheng Zhang,et al.  Quantitative Prediction of logk of Peptides in High-Performance Liquid Chromatography Based on Molecular Descriptors by Using the Heuristic Method and Support Vector Machine , 2004, J. Chem. Inf. Model..

[9]  Zhide Hu,et al.  Using classification structure pharmacokinetic relationship (SCPR) method to predict drug bioavailability based on grid-search support vector machine. , 2007, Analytica chimica acta.

[10]  Osborne R. Quayle,et al.  The Parachors of Organic Compounds. An Interpretation and Catalogue. , 1953 .

[11]  Paola Gramatica,et al.  Validated QSAR Prediction of OH Tropospheric Degradation of VOCs: Splitting into Training-Test Sets and Consensus Modeling , 2004, J. Chem. Inf. Model..

[12]  L. Buydens,et al.  Multivariate calibration with least-squares support vector machines. , 2004, Analytical chemistry.

[13]  John Aurie Dean,et al.  Lange's Handbook of Chemistry , 1978 .

[14]  T. Hassard,et al.  Applied Linear Regression , 2005 .

[15]  Ruisheng Zhang,et al.  Comparative classification study of toxicity mechanisms using support vector machines and radial basis function neural networks , 2005 .

[16]  T. A. Albahri Flammability characteristics of pure hydrocarbons , 2003 .

[17]  Giovanni Luca Christian Masala,et al.  A comparative study of K-Nearest Neighbour, Support Vector Machine and Multi-Layer Perceptron for Thalassemia screening , 2003 .

[18]  Feng Luan,et al.  Support vector machine and the heuristic method to predict the solubility of hydrocarbons in electrolyte. , 2005, The journal of physical chemistry. A.

[19]  Ruisheng Zhang,et al.  QSAR Models for the Prediction of Binding Affinities to Human Serum Albumin Using the Heuristic Method and a Support Vector Machine , 2004, J. Chem. Inf. Model..

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21]  A. Belousov,et al.  A flexible classification approach with optimal generalisation performance: support vector machines , 2002 .

[22]  Max K Leong,et al.  A novel approach using pharmacophore ensemble/support vector machine (PhE/SVM) for prediction of hERG liability. , 2007, Chemical research in toxicology.

[23]  Paola Gramatica,et al.  Statistical external validation and consensus modeling: a QSPR case study for Koc prediction. , 2007, Journal of molecular graphics & modelling.

[24]  Zhide Hu,et al.  Prediction of surface tension for common compounds based on novel methods using heuristic method and support vector machine. , 2007, Talanta.

[25]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[26]  Hongzong Si,et al.  Quantitative structure activity relationship study on EC50 of anti-HIV drugs , 2008 .

[27]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[28]  Tareq A. Albahri,et al.  Artificial neural network investigation of the structural group contribution method for predicting pure components auto ignition temperature , 2003 .

[29]  Dan C. Fara,et al.  QSPR Treatment of the Soil Sorption Coefficients of Organic Pollutants , 2005, J. Chem. Inf. Model..

[30]  R. Reid,et al.  The Properties of Gases and Liquids , 1977 .

[31]  Paola Gramatica,et al.  Principles of QSAR models validation: internal and external , 2007 .

[32]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[33]  Peter C. Jurs,et al.  Prediction of Autoignition Temperatures of Organic Compounds from Molecular Structure , 1997, J. Chem. Inf. Comput. Sci..

[34]  Zhide Hu,et al.  Quantitative structure–activity relationship study of acyl ureas as inhibitors of human liver glycogen phosphorylase using least squares support vector machines , 2007 .

[35]  Jens Sadowski,et al.  Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification , 2003, J. Chem. Inf. Comput. Sci..

[36]  Zhide Hu,et al.  Accurate quantitative structure-property relationship model to predict the solubility of C60 in various solvents based on a novel approach using a least-squares support vector machine. , 2005, The journal of physical chemistry. B.

[37]  Takahiro Suzuki,et al.  Quantitative Structure-Property Relationships for Auto-Ignition Temperatures of Organic Compounds , 1994 .

[38]  Yoonkyung Lee,et al.  Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data , 2003, Bioinform..