Big data driven outlier detection for soybean straw near infrared spectroscopy

Abstract In near infrared spectroscopy (NIRS) analysis, the prediction ability of the model is seriously affected by outliers that may be the result of errors related to the spectral measurements, the chemical analysis, or a combination of both. In this paper, an outlier detection method is described based on the NIRS analysis data of soybean straw. We improved the resampling by half-mean (RHM) method by including a confidence interval (IRHM) and combined the IRHM and Cook’s distance methods (IRHM-COOK) to detect outlier samples in the NIRS data. The confidence interval is an important parameter in the IRHM-COOK method and the optimal confidence intervals for the IRHM and Cook’s distance methods are combined and used as the confidence interval for the IRHM-COOK method. The selection process for the confidence interval is aimed at relative independence between the detection of the spectrum outliers and the chemical outliers. The experimental results show that the IRHM-COOK method is superior to the traditional Mahalanobis distance method, the IRHM method, and the Cook’s distance method using a partial least squares regression (PLS) model. The determination coefficient (R2) of a hemicellulose PLS calibration model increased from 0.4397918 to 0.5333039 and the root mean square error (RMSE) decreased from 0.7926415 to 0.7287254. The PLS models for lignin and cellulose performed better using the IRHM-COOK method than the original model. The results show that the IRHM-COOK method can effectively identify spectrum outliers and chemical outliers for soybean straw biomass. In addition, it is an effective method to handle NIRS analysis data with one type of outlier, which is proven based on an NIRS analysis of starch.

[1]  Yin Xiuli,et al.  Current status of biomass energy development in China. , 2009 .

[2]  B Kollbe Ahn,et al.  Thermally stable, transparent, pressure-sensitive adhesives from epoxidized and dihydroxyl soybean oil. , 2011, Biomacromolecules.

[3]  Hui Cao,et al.  [Multi-population elitists shared genetic algorithm for outlier detection of spectroscopy analysis]. , 2011, Guang pu xue yu guang pu fen xi = Guang pu.

[4]  Kun Li,et al.  Big data driven decision making and multi-prior models collaboration for media restoration , 2014, Multimedia Tools and Applications.

[5]  Nan Ji,et al.  Bioagriculture Outlier Elimination Based on 3D View of - Variance and Leverage Measurement , 2015 .

[6]  Wen Ji,et al.  Intelligent Marketing in Smart Cities: Crowdsourced Data for Geo-Conquesting , 2016, IT Professional.

[7]  Jay J. Cheng,et al.  Functional, physiochemical, and rheological properties of duckweed (Spirodela polyrhiza) protein. , 2011 .

[8]  Charles R. Hurburgh,et al.  Feasibility of near infrared spectroscopy for analyzing corn kernel damage and viability of soybean and corn kernels , 2012 .

[9]  Xudong Sun,et al.  Nondestructive measurement of internal quality of Nanfeng mandarin fruit by charge coupled device near infrared spectroscopy , 2010 .

[10]  Rong Liu,et al.  [Fast outlier detection for milk near-infrared spectroscopy analysis]. , 2005, Guang pu xue yu guang pu fen xi = Guang pu.

[11]  Hui Cao,et al.  [Quantitative analysis method of natural gas combustion process combining wavelength selection and outlier spectra detection]. , 2012, Guang pu xue yu guang pu fen xi = Guang pu.

[12]  Haiqing Yang,et al.  Biological Early Warning System for Prawn Aquiculture , 2011 .

[13]  Jhing-Fa Wang,et al.  Smart Homecare Surveillance System: Behavior Identification Based on State-Transition Support Vector Machines and Sound Directivity Pattern Analysis , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[14]  Laigeng Li,et al.  Rapid characterization of woody biomass digestibility and chemical composition using near-infrared spectroscopy. , 2011, Journal of integrative plant biology.

[15]  Feng Jiang,et al.  Viewpoint-independent hand gesture recognition with Kinect , 2014 .

[16]  Lujia Han,et al.  A Review on the Use of Near-Infrared Spectroscopy for Analyzing Feed Protein Materials , 2013 .

[17]  Zhi-Hong Yu,et al.  [Outlier sample discriminating methods for building calibration model in melons quality detecting using NIR spectra]. , 2012, Guang pu xue yu guang pu fen xi = Guang pu.

[18]  Michael M. Blanke Non-invasive Assessment of Firmness and NIR Sugar (TSS) Measurement in Apple, Pear and Kiwi Fruit , 2013, Erwerbs-Obstbau.

[19]  Dan Eisikowitch,et al.  Predicting Jatropha curcas seed-oil content, oil composition and protein content using near-infrared spectroscopy—A quick and non-destructive method , 2011 .

[20]  Feng Jiang,et al.  Game theory based no-reference perceptual quality assessment for stereoscopic images , 2015, The Journal of Supercomputing.

[21]  Hui-li Gong,et al.  [Study on an Algorithm for Near Infrared Singular Sample Identification Based on Strong Influence Degree]. , 2015, Guang pu xue yu guang pu fen xi = Guang pu.

[22]  Meng Qing-xiang An Outlier Diagnosis on Near Infrared Spectroscopy Analysis of NDF Content in Corn Silage Feeds , 2007 .

[23]  Huai-zhu Zhang,et al.  [Research on outlier detection methods for determination of oil yield in oil shales using near-infrared spectroscopy]. , 2014, Guang pu xue yu guang pu fen xi = Guang pu.

[24]  S. Morgan,et al.  Outlier detection in multivariate analytical chemical data. , 1998, Analytical chemistry.

[25]  S. Sokhansanj,et al.  Variability of biomass chemical composition and rapid analysis using FT-NIR techniques , 2010 .

[26]  Yang Gao,et al.  Multi-layered gesture recognition with Kinect , 2015, J. Mach. Learn. Res..

[27]  Jhing-Fa Wang,et al.  A Novel Video Summarization Based on Mining the Story-Structure and Semantic Relations Among Concept Entities , 2009, IEEE Transactions on Multimedia.

[28]  Jhing-Fa Wang,et al.  Structuralized context-aware content and scalable resolution support for wireless VoD services , 2009, IEEE Transactions on Consumer Electronics.

[29]  P. Armstrong,et al.  Measurement of single soybean seed attributes by near-infrared technologies. A comparative study. , 2012, Journal of agricultural and food chemistry.

[30]  Shun-geng Min,et al.  [Outlier diagnosis and calibration model optimization for near infrared spectroscopy analysis]. , 2004, Guang pu xue yu guang pu fen xi = Guang pu.