Support vector machine regression (SVR/LS-SVM)--an alternative to neural networks (ANN) for analytical chemistry? Comparison of nonlinear methods on near infrared (NIR) spectroscopy data.

In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects.

[1]  Donald G. Truhlar,et al.  Basis-set extrapolation , 1998 .

[2]  S. Gambhir,et al.  Quantum Dots for Live Cells, in Vivo Imaging, and Diagnostics , 2005, Science.

[3]  Jimmy D Bell,et al.  Quantification of biomedical NMR data using artificial neural network analysis: Lipoprotein lipid profiles from 1H NMR data of human plasma , 1995, NMR in biomedicine.

[4]  Nicholas Stone,et al.  Investigation of support vector machines and Raman spectroscopy for lymph node diagnostics. , 2010, The Analyst.

[5]  Roman M. Balabin,et al.  Adsorption of Petroleum Asphaltenes onto Reservoir Rock Sands Studied by Near-Infrared (NIR) Spectroscopy , 2009 .

[6]  Paul Geladi,et al.  Chemometrics in spectroscopy. Part 1. Classical chemometrics , 2003 .

[7]  Yongan Gu,et al.  Effects of asphaltene content on the heavy oil viscosity at different temperatures , 2007 .

[8]  Robert P. Cogdill,et al.  Least-Squares Support Vector Machines for Chemometrics: An Introduction and Evaluation , 2004 .

[9]  Miguel de la Guardia,et al.  Vibrational spectroscopy provides a green tool for multi-component analysis , 2010 .

[10]  K. Brudzewski,et al.  Gasoline quality prediction using gas chromatography and FTIR spectroscopy: An artificial intelligence approach , 2006 .

[11]  R. Poppi,et al.  Non-destructive method for determination of hydroxyl value of soybean polyol by LS-SVM using HATR/FT-IR. , 2007, Analytica chimica acta.

[12]  Roman M. Balabin,et al.  Gasoline classification by source and type based on near infrared (NIR) spectroscopy data , 2008 .

[13]  Di Wu,et al.  Short-wave near-infrared spectroscopy analysis of major compounds in milk powder and wavelength assignment. , 2008, Analytica chimica acta.

[14]  Roman M. Balabin,et al.  Tautomeric equilibrium and hydrogen shifts in tetrazole and triazoles: focal-point analysis and ab initio limit. , 2009, The Journal of chemical physics.

[15]  O. Wolfbeis,et al.  Optical sensing of pH using thin films of substituted polyanilines , 1997 .

[16]  Fei Liu,et al.  Variable selection in visible/near infrared spectra for linear and nonlinear calibrations: a case study to determine soluble solids content of beer. , 2009, Analytica chimica acta.

[17]  Lei Jiang,et al.  Definition of Superhydrophobic States , 2007 .

[18]  S. Wold Nonlinear partial least squares modelling II. Spline inner relation , 1992 .

[19]  Roman M. Balabin,et al.  Capabilities of near Infrared Spectroscopy for the Determination of Petroleum Macromolecule Content in Aromatic Solutions , 2007 .

[20]  Roman M. Balabin,et al.  Frequency Dependence of Oil Conductivity at High Pressure , 2007 .

[21]  Roman M. Balabin,et al.  Motor oil classification by base stock and viscosity based on near infrared (NIR) spectroscopy data , 2008 .

[22]  Roman M. Balabin,et al.  Petroleum resins adsorption onto quartz sand: near infrared (NIR) spectroscopy study. , 2008, Journal of colloid and interface science.

[23]  Menglong Li,et al.  Computer-assisted prediction of pesticide substructure using mass spectra. , 2007, Analytica chimica acta.

[24]  Liguang Xu,et al.  Analytical methods and recent developments in the detection of melamine , 2010 .

[25]  Fei Liu,et al.  Determination of effective wavelengths for discrimination of fruit vinegars using near infrared spectroscopy and multivariate analysis. , 2008, Analytica chimica acta.

[26]  Roman M. Balabin,et al.  Neural network approach to quantum-chemistry data: accurate prediction of density functional theory energies. , 2009, The Journal of chemical physics.

[27]  Fei Liu,et al.  Determination of acetolactate synthase activity and protein content of oilseed rape (Brassica napus L.) leaves using visible/near-infrared spectroscopy. , 2008, Analytica chimica acta.

[28]  Roman M. Balabin Dispersed Structure of Ethanol‐Gasoline Fuel According to Dynamic Light Scattering Method , 2008 .

[29]  Eduardo Carasek,et al.  Determination of cadmium in alcohol fuel using Moringa oleifera seeds as a biosorbent in an on-line system coupled to FAAS. , 2010, Talanta.

[30]  L. Buydens,et al.  Multivariate calibration with least-squares support vector machines. , 2004, Analytical chemistry.

[31]  R. Poppi,et al.  Least-squares support vector machines and near infrared spectroscopy for quantification of common adulterants in powdered milk. , 2006, Analytica chimica acta.

[32]  Giovanni Luca Christian Masala,et al.  A comparative study of K-Nearest Neighbour, Support Vector Machine and Multi-Layer Perceptron for Thalassemia screening , 2003 .

[33]  Roman M. Balabin,et al.  Comparison of linear and nonlinear calibration models based on near infrared (NIR) spectroscopy data for gasoline properties prediction , 2007 .

[34]  I. E. Frank A nonlinear PLS model , 1990 .

[35]  Pengyuan Yang,et al.  Improvements in protein identification confidence and proteome coverage for human liver proteome study by coupling a parallel mass spectrometry/mass spectrometry analysis with multi-dimensional chromatography separation , 2006 .

[36]  Roman M. Balabin,et al.  Quantitative Measurement of Ethanol Distribution over Fractions of Ethanol−Gasoline Fuel , 2007 .

[37]  R. H. Dettre,et al.  Contact Angle Hysteresis. III. Study of an Idealized Heterogeneous Surface , 1964 .

[38]  R. Brereton,et al.  Support vector machines for classification and regression. , 2010, The Analyst.

[39]  Bernhard Lendl,et al.  Stand-off Raman spectroscopy , 2009 .

[40]  Steven A. Benner,et al.  Synthesis and tautomeric equilibrium of 6-amino-5-benzyl-3-methylpyrazin-2-one. An acceptor-donor-donor nucleoside base analog , 1993 .

[41]  Fei Liu,et al.  Comparison of calibrations for the determination of soluble solids content and pH of rice vinegars using visible and short-wave near infrared spectroscopy. , 2008, Analytica chimica acta.

[42]  Noel S. Hush,et al.  Solvent effects on the electronic spectra of transition metal complexes. , 2000, Chemical reviews.

[43]  A. Zell,et al.  Bioinformatical evaluation of modified nucleosides as biomedical markers in diagnosis of breast cancer. , 2008, Analytica chimica acta.

[44]  L. Buydens,et al.  Comparing support vector machines to PLS for spectral regression applications , 2004 .

[45]  Effendi Widjaja,et al.  A novel method for human gender classification using Raman spectroscopy of fingernail clippings. , 2008, The Analyst.

[46]  Roman M. Balabin,et al.  Molar enthalpy of vaporization of ethanol–gasoline mixtures and their colloid state , 2007 .

[47]  Roman M. Balabin,et al.  Wavelet neural network (WNN) approach for calibration model building based on gasoline near infrared (NIR) spectra , 2008 .

[48]  M Karplus,et al.  Evolutionary optimization in quantitative structure-activity relationship: an application of genetic neural networks. , 1996, Journal of medicinal chemistry.

[49]  Roman M. Balabin,et al.  Neural network (ANN) approach to biodiesel analysis: Analysis of biodiesel density, kinematic viscosity, methanol and water contents using near infrared (NIR) spectroscopy , 2011 .

[50]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[51]  S. Wold,et al.  Nonlinear PLS modeling , 1989 .

[52]  Roman M. Balabin,et al.  Polarization of Fluorescence of Asphaltene Containing Systems , 2008 .

[53]  Pengcheng Nie,et al.  Hybrid variable selection in visible and near-infrared spectral analysis for non-invasive quality determination of grape juice. , 2010, Analytica chimica acta.

[54]  Beata Walczak,et al.  Tracing the geographical origin of honeys based on volatile compounds profiles assessment using pattern recognition techniques , 2010 .

[55]  Roman M. Balabin,et al.  Near-infrared (NIR) spectroscopy for motor oil classification: From discriminant analysis to support vector machines , 2011 .

[56]  M Ventura,et al.  Qualitative evaluation of chromatographic data from quality control schemes using a support vector machine. , 2008, The Analyst.

[57]  Dustin J Penn,et al.  Consensus multivariate methods in gas chromatography mass spectrometry and denaturing gradient gel electrophoresis: MHC-congenic and other strains of mice can be classified according to the profiles of volatiles and microflora in their scent-marks. , 2009, The Analyst.

[58]  J. Roger,et al.  Application of LS-SVM to non-linear phenomena in NIR spectroscopy: development of a robust and portable sensor for acidity prediction in grapes , 2004 .

[59]  Roman M. Balabin,et al.  Biodiesel classification by base stock type (vegetable oil) using near infrared spectroscopy data. , 2011, Analytica chimica acta.

[60]  Ivana Radivojevic,et al.  Self-organized porphyrinic materials. , 2009, Chemical reviews.

[61]  Roman M. Balabin,et al.  Gasoline classification using near infrared (NIR) spectroscopy data: comparison of multivariate techniques. , 2010, Analytica chimica acta.