Noise reduction for near-infrared spectroscopy data using extreme learning machines

Abstract The near infrared (NIR) spectra technique is an effective approach to predict chemical properties and it is typically applied in petrochemical, agricultural, medical, and environmental sectors. NIR spectra are usually of very high dimensions and contain huge amounts of information. Most of the information is irrelevant to the target problem and some is simply noise. Thus, it is not an easy task to discover the relationship between NIR spectra and the predictive variable. However, this kind of regression analysis is one of the main topics of machine learning. Thus machine learning techniques play a key role in NIR based analytical approaches. Pre-processing of NIR spectral data has become an integral part of chemometrics modeling. The objective of the pre-processing is to remove physical phenomena (noise) in the spectra in order to improve the regression or classification model. In this work, we propose to reduce the noise using extreme learning machines which have shown good predictive performances in regression applications as well as in large dataset classification tasks. For this, we use a novel algorithm called C-PL-ELM, which has an architecture in parallel based on a non-linear layer in parallel with another non-linear layer. Using the soft margin loss function concept, we incorporate two Lagrange multipliers with the objective of including the noise of spectral data. Six real-life dataset were analyzed to illustrate the performance of the developed models. The results for regression and classification problems confirm the advantages of using the proposed method in terms of root mean square error and accuracy.

[1]  Qing-Song Xu,et al.  Support vector machines and its applications in chemistry , 2009 .

[2]  E. K. Kemsley,et al.  FTIR spectroscopy and multivariate analysis can distinguish the geographic origin of extra virgin olive oils. , 2003, Journal of agricultural and food chemistry.

[3]  R. Poppi,et al.  Quality evaluation of frozen guava and yellow passion fruit pulps by NIR spectroscopy and chemometrics. , 2016, Food research international.

[4]  Federico Marini,et al.  Application of near infrared (NIR) spectroscopy coupled to chemometrics for dried egg-pasta characterization and egg content quantification. , 2013, Food chemistry.

[5]  P Dardenne,et al.  Comparison of various chemometric approaches for large near infrared spectroscopic data of feed and feed products. , 2011, Analytica chimica acta.

[6]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[7]  José Manuel Amigo,et al.  Identification and quantification of turkey meat adulteration in fresh, frozen-thawed and cooked minced beef by FT-NIR spectroscopy and chemometrics. , 2016, Meat science.

[8]  Jun Li,et al.  ${{\rm E}^{2}}{\rm LMs}$ : Ensemble Extreme Learning Machines for Hyperspectral Image Classification , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[9]  Ahmad Ghasemloonia,et al.  Application and comparison of an ANN-based feature selection method and the genetic algorithm in gearbox fault diagnosis , 2011, Expert Syst. Appl..

[10]  Serge Kokot,et al.  NIR spectroscopy and chemometrics for the discrimination of pure, powdered, purple sweet potatoes and their samples adulterated with the white sweet potato flour , 2015 .

[11]  D. Massart,et al.  Near-infrared spectroscopy applications in pharmaceutical analysis. , 2007, Talanta.

[12]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[13]  Gonzalo A. Ruz,et al.  A non-iterative method for pruning hidden neurons in neural networks with random weights , 2018, Appl. Soft Comput..

[14]  Qun Sun,et al.  Comparison of chemometric approaches for near-infrared spectroscopic data , 2016 .

[15]  Jiewen Zhao,et al.  Determination of Amino Acid Nitrogen in Soy Sauce Using Near Infrared Spectroscopy Combined with Characteristic Variables Selection and Extreme Learning Machine , 2013, Food and Bioprocess Technology.

[16]  Ming Li,et al.  Insights into randomized algorithms for neural networks: Practical issues and common pitfalls , 2017, Inf. Sci..

[17]  Hwa Jen Yap,et al.  A Constrained Optimization based Extreme Learning Machine for noisy data regression , 2016, Neurocomputing.

[18]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[19]  Ricard Boqué,et al.  Detection and estimation of Super premium 95 gasoline adulteration with Premium 91 gasoline using new NIR spectroscopy combined with multivariate methods , 2017 .

[20]  Xiaoli Li,et al.  Rapid detection of talcum powder in tea using FT-IR spectroscopy coupled with chemometrics , 2016, Scientific Reports.

[21]  L. C. Robles,et al.  Fourier-transform infrared spectroscopic study of the interactions of selenium species with living bacterial cells , 2004, Analytical and bioanalytical chemistry.

[22]  Wang Jiangjiang,et al.  Spectral quantitative analysis of complex samples based on the extreme learning machine , 2016 .

[23]  Chee Kheong Siew,et al.  Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , 2006, IEEE Transactions on Neural Networks.

[24]  Jiewen Zhao,et al.  Rapid measurement of total acid content (TAC) in vinegar using near infrared spectroscopy based on efficient variables selection algorithm and nonlinear regression tools. , 2012, Food chemistry.

[25]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[26]  Changkun Wang,et al.  Predicting Soil Salinity with Vis–NIR Spectra after Removing the Effects of Soil Moisture Using External Parameter Orthogonalization , 2015, PloS one.

[27]  Lu Liu,et al.  Improved prediction of biomass composition for switchgrass using reproducing kernel methods with wavelet compressed FT-NIR spectra , 2012, Expert Syst. Appl..

[28]  Romain Briandet,et al.  Discrimination of Arabica and Robusta in Instant Coffee by Fourier Transform Infrared Spectroscopy and Chemometrics , 1996 .

[29]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[30]  Patricio Peralta-Zamora,et al.  Determination of total phenolic compounds in yerba mate (Ilex paraguariensis) combining near infrared spectroscopy (NIR) and multivariate analysis , 2015 .

[31]  Ewan W Blanch,et al.  Determination of Protein Secondary Structure from Infrared Spectra Using Partial Least-Squares Regression. , 2016, Biochemistry.

[32]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[33]  Simon X. Yang,et al.  A comparative study for least angle regression on NIR spectra analysis to determine internal qualities of navel oranges , 2015, Expert Syst. Appl..

[34]  E. K. Kemsley,et al.  Mid-infrared spectroscopy and authenticity problems in selected meats: a feasibility study , 1997 .

[35]  Juanita Freer,et al.  Fourier transform infrared imaging and microscopy studies of Pinus radiata pulps regarding the simultaneous saccharification and fermentation process. , 2015, Analytica chimica acta.

[36]  Gonzalo A. Ruz,et al.  Extreme learning machine with a deterministic assignment of hidden weights in two parallel layers , 2017, Neurocomputing.

[37]  Weixing Zhu,et al.  Determination of Pear Internal Quality Attributes by Fourier Transform Near Infrared (FT-NIR) Spectroscopy and Multivariate Analysis , 2013, Food Analytical Methods.

[38]  Wei Shi,et al.  Rapid and nondestructive detection of multiple adulterants in kudzu starch by near infrared (NIR) spectroscopy and chemometrics , 2015 .

[39]  P. N. Suganthan,et al.  A comprehensive evaluation of random vector functional link networks , 2016, Inf. Sci..

[40]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[41]  E. K. Kemsley,et al.  Use of Fourier transform infrared spectroscopy and partial least squares regression for the detection of adulteration of strawberry purées , 1998 .

[42]  Kai Zhang,et al.  Extreme learning machine and adaptive sparse representation for image classification , 2016, Neural Networks.

[43]  F. Marini,et al.  Tracing the origin of extra virgin olive oils by infrared spectroscopy and chemometrics: a case study. , 2012, Analytica chimica acta.

[44]  Tom Fearn,et al.  The Interaction between Standard Normal Variate and Derivatives , 2008 .

[45]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[46]  Xihui Bian,et al.  A boosting extreme learning machine for near-infrared spectral quantitative analysis of diesel fuel and edible blend oil samples , 2017 .

[47]  Yibin Ying,et al.  Spectroscopy-based food classification with extreme learning machine , 2014 .

[48]  Tom Fearn,et al.  Design of Experiments 6: Evolutionary Operation (EVOP) , 2008 .

[49]  F Despagne,et al.  Neural networks in multivariate calibration. , 1998, The Analyst.

[50]  R. Wightman,et al.  Multivariate concentration determination using principal component regression with residual analysis. , 2009, Trends in analytical chemistry : TRAC.

[51]  Le Zhang,et al.  A survey of randomized algorithms for training neural networks , 2016, Inf. Sci..

[52]  Seoung Bum Kim,et al.  An effective classification procedure for diagnosis of prostate cancer in near infrared spectra , 2010, Expert Syst. Appl..

[53]  Ivan Tyukin,et al.  Approximation with random bases: Pro et Contra , 2015, Inf. Sci..

[54]  Zhao Li,et al.  Rapid detection of volatile compounds in apple wines using FT-NIR spectroscopy. , 2016, Food chemistry.

[55]  José Blasco,et al.  Visible–NIR reflectance spectroscopy and manifold learning methods applied to the detection of fungal infections on citrus fruit , 2015 .

[56]  Guohua Zhao,et al.  Rapid determination of total protein and wet gluten in commercial wheat flour using siSVR-NIR. , 2017, Food chemistry.

[57]  Dejan J. Sobajic,et al.  Learning and generalization characteristics of the random vector Functional-link net , 1994, Neurocomputing.

[58]  Yuanyan Tang,et al.  Combination of activation functions in extreme learning machines for multivariate calibration , 2013 .