Support vector machines and its applications in chemistry

Support vector machines (SVMs) are a promising machine learning method originally developed for pattern recognition problem based on structural risk minimization. Functionally, SVMs can be divided into two categories: support vector classification (SVC) machines and support vector regression (SVR) machines. According to this classification, their basic elements and algorithms are discussed in some detail and selected applications on two real world datasets and two simulated datasets are conducted to elucidate the good generalization performance of SVMs, specially good for treating the data of some nonlineartiy.

[1]  Gary S. May,et al.  Advantages of plasma etch modeling using neural networks over statistical techniques , 1993 .

[2]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[3]  A. Höskuldsson PLS regression methods , 1988 .

[4]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[5]  Sajjad Gharaghani,et al.  Prediction of selectivity coefficients of univalent anions for anion-selective electrode using support vector machine , 2008 .

[7]  Yi-Zeng Liang,et al.  Monte Carlo cross validation , 2001 .

[8]  Juan José Rodríguez Diez,et al.  Support vector machines of interval-based features for time series classification , 2005, Knowl. Based Syst..

[9]  D. Massart,et al.  Application of Radial Basis Functions — Partial Least Squares to non-linear pattern recognition problems: diagnosis of process faults , 1996 .

[10]  Wencong Lu,et al.  Support vector regression based QSPR for the prediction of some physicochemical properties of alkyl benzenes , 2005 .

[11]  Ulf Norinder,et al.  Support vector machine models in drug design: applications to drug transport processes and QSAR using simplex optimisations and variable selection , 2003, Neurocomputing.

[12]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[13]  Anders H. Andersen,et al.  Partial least squares as a target-directed structure-seeking technique , 2004 .

[14]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[15]  Reiji Teramoto,et al.  Prediction of siRNA functionality using generalized string kernel and support vector machine , 2005, FEBS letters.

[16]  Knut Baumann,et al.  Cross-validation as the objective function for variable-selection techniques , 2003 .

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Dazhou Zhu,et al.  The performance of ν-support vector regression on determination of soluble solids content of apple by acousto-optic tunable filter near-infrared spectroscopy , 2007 .

[19]  Xiangyang Wang,et al.  An SVM-based robust digital image watermarking against desynchronization attacks , 2008, Signal Process..

[20]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[21]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[22]  Douglas R. Henry,et al.  Pattern Recognition Studies of Complex Chromatographic Data Sets. , 1986, Journal of research of the National Bureau of Standards.

[23]  Christophe Croux,et al.  Influence of observations on the misclassification probability in quadratic discriminant analysis , 2005 .

[24]  Friedhelm Schwenker,et al.  Three learning phases for radial-basis-function networks , 2001, Neural Networks.

[25]  Yukihiro Ozaki,et al.  Improvement of partial least squares models for in vitro and in vivo glucose quantifications by using near-infrared spectroscopy and searching combination moving window partial least squares , 2006 .

[26]  Giorgio Corani,et al.  Structural risk minimization: a robust method for density-dependence detection and model selection , 2007 .

[27]  Timothy V. Larson,et al.  A multivariate chemical classification of rainwater samples , 1988 .

[28]  Chih-Jen Lin,et al.  Training v-Support Vector Classifiers: Theory and Algorithms , 2001, Neural Computation.

[29]  Rolf Ergon,et al.  Reduced PCR/PLSR models by subspace projections , 2006 .

[30]  Ruisheng Zhang,et al.  Radial basis function neural network-based QSPR for the prediction of critical temperature , 2002 .

[31]  L. Buydens,et al.  Visualisation and interpretation of Support Vector Regression models. , 2007, Analytica chimica acta.

[32]  Bernhard Schölkopf,et al.  Experimentally optimal v in support vector regression for different noise models and parameter settings , 2004, Neural Networks.

[33]  J. Friedman Exploratory Projection Pursuit , 1987 .

[34]  R Zhang,et al.  A quadratic discriminant analysis of protein structure classification based on the Helix/Strand content. , 1999, Journal of theoretical biology.

[35]  Peter C. Jurs,et al.  Classification of Inhibitors of Protein Tyrosine Phosphatase 1B Using Molecular Structure Based Descriptors , 2003, J. Chem. Inf. Comput. Sci..

[36]  Lorenzo Bruzzone,et al.  A technique for the selection of kernel-function parameters in RBF neural networks for classification of remote-sensing images , 1999, IEEE Trans. Geosci. Remote. Sens..

[37]  Zhide Hu,et al.  Prediction of surface tension for common compounds based on novel methods using heuristic method and support vector machine. , 2007, Talanta.

[38]  George G. Roussas,et al.  Asymptotic normality of the kernel estimate of a probability density function under association , 2000 .

[39]  Desire L. Massart,et al.  Regularised discriminant analysis (RDA) - modelling for the binary discrimination between pollution types , 1997 .

[40]  Byungwhan Kim,et al.  An optimal neural network plasma model: a case study , 2001 .

[41]  William Stafford Noble,et al.  Support vector machine , 2013 .

[42]  Sung-Bae Cho,et al.  Fingerprint classification using one-vs-all support vector machines dynamically ordered with naive Bayes classifiers , 2008, Pattern Recognit..

[43]  Yue-Shi Lee,et al.  Robust and efficient multiclass SVM models for phrase pattern recognition , 2008, Pattern Recognit..

[44]  L. Buydens,et al.  Comparing support vector machines to PLS for spectral regression applications , 2004 .

[45]  Bernard F. Buxton,et al.  Drug Design by Machine Learning: Support Vector Machines for Pharmaceutical Data Analysis , 2001, Comput. Chem..

[46]  Giovanni Luca Christian Masala,et al.  A comparative study of K-Nearest Neighbour, Support Vector Machine and Multi-Layer Perceptron for Thalassemia screening , 2003 .

[47]  H. Lohninger,et al.  Classification of mass spectra: A comparison of yes/no classification methods for the recognition of simple structural properties , 1994 .

[48]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[49]  Kyoung-jae Kim,et al.  Financial time series forecasting using support vector machines , 2003, Neurocomputing.

[50]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[51]  Y. Z. Chen,et al.  Protein function classification via support vector machine approach. , 2003, Mathematical biosciences.

[52]  Elaine B. Martin,et al.  Model selection for partial least squares regression , 2002 .

[53]  Panagiotis Patrinos,et al.  A two-stage evolutionary algorithm for variable selection in the development of RBF neural network models , 2005 .

[54]  Corinna Cortes,et al.  Prediction of Generalization Ability in Learning Machines , 1994 .

[55]  Lutgarde M. C. Buydens,et al.  Using support vector machines for time series prediction , 2003 .

[56]  Takio Kurita,et al.  A Kernel-Based Fisher Discriminant Analysis for Face Detection , 2005, IEICE Trans. Inf. Syst..

[57]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Protein cellular localization prediction with Support Vector Machines and Decision Trees , 2007, Comput. Biol. Medicine.

[58]  D. Massart,et al.  The Radial Basis Functions — Partial Least Squares approach as a flexible non-linear regression technique , 1996 .

[59]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[60]  Zhirong Sun,et al.  Identifying genes related to drug anticancer mechanisms using support vector machine , 2002, FEBS letters.

[61]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[62]  Manish H. Bharati,et al.  Automatic masking in multivariate image analysis using support vector machines , 2005 .

[63]  Kuo-Chen Chou,et al.  Prediction of Protein Structural Classes by Support Vector Machines , 2002, Comput. Chem..

[64]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[65]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[66]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[67]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[68]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[69]  Zhaohong Deng,et al.  Robust fuzzy clustering neural network based on epsilon-insensitive loss function , 2007, Appl. Soft Comput..

[70]  Xingyi Huang,et al.  Qualitative identification of tea categories by near infrared spectroscopy and support vector machine. , 2006, Journal of pharmaceutical and biomedical analysis.

[71]  A. Belousov,et al.  Applicational aspects of support vector machines , 2002 .

[72]  R. Poppi,et al.  Least-squares support vector machines and near infrared spectroscopy for quantification of common adulterants in powdered milk. , 2006, Analytica chimica acta.

[73]  M Lebl,et al.  Using Support Vector Machine Regression to Model the Retention of Peptides in Immobilized Metal-affinity Chromatography. , 2007, Sensors and actuators. B, Chemical.

[74]  Yun Xu,et al.  Support Vector Machines: A Recent Method for Classification in Chemometrics , 2006 .

[75]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[76]  Silvia Lanteri,et al.  Classification models: Discriminant analysis, SIMCA, CART , 1989 .

[77]  Sajjad Gharaghani,et al.  A novel QSAR model for prediction of apoptosis-inducing activity of 4-aryl-4-H-chromenes based on support vector machine. , 2007, Bioorganic & medicinal chemistry.

[78]  Browne,et al.  Cross-Validation Methods. , 2000, Journal of mathematical psychology.

[79]  Zhide Hu,et al.  The accurate QSPR models to predict the bioconcentration factors of nonionic organic compounds based on the heuristic method and support vector machine. , 2006, Chemosphere.

[80]  N. Kasabov,et al.  Linear and non-linear pattern recognition models for classification of fruit from visible–near infrared spectra , 2000 .

[81]  B. Kowalski,et al.  Multivariate instrument standardization , 1991 .

[82]  Di Wu,et al.  Study on infrared spectroscopy technique for fast measurement of protein content in milk powder based on LS-SVM , 2008 .

[83]  A. Niazi,et al.  Prediction of toxicity of nitrobenzenes using ab initio and least squares support vector machines. , 2008, Journal of hazardous materials.