Determination of Water Content in Automobile Lubricant Using Near-Infrared Spectroscopy Improved by Machine Learning Analysis

The main objective of this paper is to determine the water content of automobile lubricant based on the near-infrared (NIR) spectra collected and to observe whether NIR spectroscopy could be used for predicting water content. Least square support vector machine (LS-SVM), back-propagation neural networks (BPNN) and Gaussian processes regression (GPR) were employed to develop prediction models. There were 150 samples for training set and test set, 6 inputs for one sample obtained by principle component analysis (PCA). LS-SVM models were developed with a grid search technique and RBF kernel function. The Levenberg-Marquardt algorithm was employed to optimize back-propagation neural network (BPNN) and models with 5 and 6 neurons in hidden layer were developed, respectively. The BPNN model with 5 neurons in hidden layer outperformed the one with 6 neurons. Three GPR models were built based on full data points (full GPR), subset of regressors (SR GPR) and subset of datasets (SD GPR), respectively, with Squared exponential (SE) covariance function. The full GPR outperformed SR GPR and SD GPR.The overall results indicted that the Gaussian processes model outperformed LS-SVM and BPNN model. GPR was an effective way for the regress prediction. NIR spectroscopy combined with PCA and GPR had the capability to determine the water content of automobile lubricant with high accuracy.

[1]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[2]  J P Doucet,et al.  QSAR and classification study of 1,4-dihydropyridine calcium channel antagonists based on least squares support vector machines. , 2005, Molecular pharmaceutics.

[3]  C. Pasquini Near Infrared Spectroscopy: fundamentals, practical aspects and analytical applications , 2003 .

[4]  Changqing Wu,et al.  Least Square Support Vector Machine Analysis for the Classification of Paddy Seeds by Harvest Year , 2008 .

[5]  Juan A. Lazzús,et al.  Hybrid Method to Predict Melting Points of Organic Compounds Using Group Contribution + Neural Network + Particle Swarm Algorithm , 2009 .

[6]  Natalia Artemenko,et al.  Distance Dependent Scoring Function for Describing Protein-Ligand Intermolecular Interactions , 2008, J. Chem. Inf. Model..

[7]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[8]  Celio Pasquini,et al.  Assessment of infrared spectroscopy and multivariate techniques for monitoring the service condition of diesel-engine lubricating oils. , 2006, Talanta.

[9]  Tudor I. Oprea,et al.  hERG classification model based on a combination of support vector machine method and GRIND descriptors. , 2008, Molecular pharmaceutics.

[10]  Jürgen Bajorath,et al.  Combining Cluster Analysis, Feature Selection and Multiple Support Vector Machine Models for the Identification of Human Ether‐a‐go‐go Related Gene Channel Blocking Compounds , 2009, Chemical biology & drug design.

[11]  Roman Garnett,et al.  Gaussian processes for prediction of homing pigeon flight trajectories , 2009 .

[12]  Uko Maran,et al.  Modeling the Toxicity of Chemicals to Tetrahymena pyriformis Using Heuristic Multilinear Regression and Heuristic Back-Propagation Neural Networks , 2007, J. Chem. Inf. Model..

[13]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[14]  Sebastian Mika,et al.  Bias-Correction of Regression Models: A Case Study on hERG Inhibition , 2009, J. Chem. Inf. Model..

[15]  Hae-Soo Oh,et al.  Reproducing polynomial (singularity) particle methods and adaptive meshless methods for two-dimensional elliptic boundary value problems , 2009 .

[16]  Shengdun Zhao,et al.  Application of LSSVM with AGA optimizing parameters to nonlinear modeling of SRM , 2008, 2008 3rd IEEE Conference on Industrial Electronics and Applications.

[17]  Wenjian Wang,et al.  Determination of the spread parameter in the Gaussian kernel for classification and regression , 2003, Neurocomputing.

[18]  Raúl Rojas,et al.  Neural Networks - A Systematic Introduction , 1996 .

[19]  Stephen J. Wright,et al.  An inexact Levenberg-Marquardt method for large sparse nonlinear least squres , 1985, The Journal of the Australian Mathematical Society. Series B. Applied Mathematics.

[20]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[21]  Klaus-Robert Müller,et al.  A Probabilistic Approach to Classifying Metabolic Stability , 2008, J. Chem. Inf. Model..

[22]  Carl Tim Kelley,et al.  Iterative methods for optimization , 1999, Frontiers in applied mathematics.

[23]  Jorge J. Moré,et al.  The Levenberg-Marquardt algo-rithm: Implementation and theory , 1977 .

[24]  Gavin Taylor,et al.  Kernelized value function approximation for reinforcement learning , 2009, ICML '09.

[25]  J. Roger,et al.  Application of LS-SVM to non-linear phenomena in NIR spectroscopy: development of a robust and portable sensor for acidity prediction in grapes , 2004 .

[26]  Øystein Brandal,et al.  Our current understanding of water-in-crude oil emulsions. - Recent characterization techniques and high pressure performance , 2003 .

[27]  N. Díaz-Herrera,et al.  Optical fiber spectroscopy for measuring quality indicators of lubricant oils , 2008, International Conference on Optical Fibre Sensors.

[28]  Igor V. Tetko,et al.  Exhaustive QSPR Studies of a Large Diverse Set of Ionic Liquids: How Accurately Can We Predict Melting Points? , 2007, J. Chem. Inf. Model..

[29]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[30]  Alexandre F. Santos,et al.  Evaluation of Water Content and Average Droplet Size in Water-in-Crude Oil Emulsions by Means of Near-Infrared Spectroscopy† , 2008 .

[31]  Ashok Srivastava,et al.  Stable and Efficient Gaussian Process Calculations , 2009, J. Mach. Learn. Res..

[32]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .