Classification of lignocellulosic biomass by weighted‐covariance factor fuzzy C‐means clustering of mid‐infrared and near‐infrared spectra

The analysis of lignocellulosic materials is crucial to optimizing the conversion efficiencies in biorefineries and to studying crop residue input to soil nutrient cycles. Mid‐infrared (MIR) and near‐infrared (NIR) spectroscopies are rapid, simple, and nondestructive methods for the determination of biomass compositions. However, the analysis of a small set of plant biomass is not generally possible with conventional methods of data processing, such as partial least squares. Additionally, IR spectra do not distribute spherically in the data space. To overcome these problems, we propose a weighted‐covariance factor fuzzy C‐means clustering method combined with bootstrapping. The algorithm can classify spherical and nonspherical clusters, in contrast to classic fuzzy C‐means, which is only adapted to spherical clusters. Bootstrapping enables resampling of the available spectra to generate several datasets on which the classification is performed. This unsupervised clustering methodology was tested to classify a small set of maize roots in soil according to genotype or period of their biodegradation process based on their NIR and MIR spectra. This methodology is applied to determine the optimal pretreatment of IR spectra, to study the contribution of the combination of MIR and NIR spectra and to compare the results on spectral and chemical data. The results show that the best methods of pretreatment are the first‐order Savitzky‐Golay derivative followed by standard normal variate. The MIR spectra produce a better result than NIR spectra for the initial characterization and for dynamic samples, while MIR spectra acquired on raw samples, without soluble extraction, provided better classification than wet chemistry.

[1]  Abbas Rammal,et al.  Weighted-covariance factor fuzzy c-means clustering , 2015, 2015 Third International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE).

[2]  Seema Singh,et al.  High-throughput prediction of eucalypt lignin syringyl/guaiacyl content using multivariate analysis: a comparison between mid-infrared, near-infrared, and Raman spectroscopies for model development , 2014, Biotechnology for Biofuels.

[3]  K. McDonnell,et al.  Evaluation of infrared techniques for the assessment of biomass and biofuel quality parameters and conversion technology processes: A review , 2014 .

[4]  Yang Zhang,et al.  Rice plant-hopper infestation detection and classification algorithms based on fractal dimension values and fuzzy C-means , 2013, Math. Comput. Model..

[5]  Floyd E. Dowell,et al.  Qualitative and quantitative analysis of lignocellulosic biomass using infrared techniques: A mini-review , 2013 .

[6]  D. Purcell,et al.  Diffuse Reflectance, Near-Infrared Spectroscopic Estimation of Sugarcane Lignocellulose Components—Effect of Sample Preparation and Calibration Approach , 2013, BioEnergy Research.

[7]  J. O. Baker,et al.  Tracking dynamics of plant biomass composting by changes in substrate structure, microbial community, and enzyme activity , 2012, Biotechnology for Biofuels.

[8]  P. Champagne,et al.  Quantitative characterization of lignocellulosic biomass using surrogate mixtures and multivariate techniques. , 2012, Bioresource technology.

[9]  Karin Fackler,et al.  A Review of Band Assignments in near Infrared Spectra of Wood and Wood Components , 2011 .

[10]  S. Recous,et al.  Impact of plant cell wall network on biodegradation in soil: Role of lignin composition and phenolic acids in roots from 16 maize genotypes , 2011 .

[11]  Bor-Chen Kuo,et al.  A New Weighted Fuzzy C-Means Clustering Algorithm for Remotely Sensed Image Classification , 2011, IEEE Journal of Selected Topics in Signal Processing.

[12]  A. Azarfar,et al.  Detecting Molecular Features of Spectra Mainly Associated with Structural and Non-Structural Carbohydrates in Co-Products from BioEthanol Production Using DRIFT with Uni- and Multivariate Molecular Spectral Analyses , 2011, International journal of molecular sciences.

[13]  B. Chabbert,et al.  Effect of harvesting date on the composition and saccharification of Miscanthus x giganteus. , 2010, Bioresource technology.

[14]  Jun Yao,et al.  Qualitative and quantitative analysis of wood samples by Fourier transform infrared spectroscopy and multivariate analysis. , 2010 .

[15]  A. Womac,et al.  Pretreatment of near Infrared Spectral Data in Fast Biomass Analysis , 2010 .

[16]  Nathalie Dupuy,et al.  Chemometric analysis of combined NIR and MIR spectra to characterize French olives , 2010 .

[17]  Edward Hodgson,et al.  Measurement of key compositional parameters in two species of energy grass by Fourier transform infrared spectroscopy. , 2009, Bioresource technology.

[18]  Frans van den Berg,et al.  Review of the most common pre-processing techniques for near-infrared spectra , 2009 .

[19]  Nathalie Dupuy,et al.  Automated principal component-based orthogonal signal correction applied to fused near infrared-mid-infrared spectra of French olive oils. , 2009, Analytical chemistry.

[20]  David W. Templeton,et al.  Assessing corn stover composition and sources of variability via NIRS , 2009 .

[21]  E. Wolfrum,et al.  Correlating detergent fiber analysis and dietary fiber analysis data for corn stover collected by NIRS , 2009 .

[22]  Brigitte Chabbert,et al.  Decomposition in soil and chemical changes of maize roots with genetic variations affecting cell wall quality , 2009 .

[23]  Shahab Sokhansanj,et al.  Fast classification and compositional analysis of cornstover fractions using Fourier transform near-infrared techniques. , 2008, Bioresource technology.

[24]  András Bárdossy,et al.  Fuzzy classification of microbial biomass and enzyme activities in grassland soils , 2007 .

[25]  David K. Johnson,et al.  Biomass Recalcitrance: Engineering Plants and Enzymes for Biofuels Production , 2007, Science.

[26]  R. V. Rossel,et al.  Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties , 2006 .

[27]  Helena Pereira,et al.  Effects of short-time vibratory ball milling on the shape of FT-IR spectra of wood and cellulose , 2004 .

[28]  Jongwoo Kim,et al.  A note on the Gustafson-Kessel and adaptive fuzzy clustering algorithms , 1999, IEEE Trans. Fuzzy Syst..

[29]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  T. Morvan,et al.  Near infrared reflectance spectroscopy: A tool to characterize the composition of different types of exogenous organic matter and their behaviour in soil , 2011 .

[31]  Monica Casale,et al.  The potential of coupling information using three analytical techniques for identifying the geographical origin of Liguria extra virgin olive oil , 2010 .

[32]  Jeng-Ming Yih,et al.  Fuzzy C-means algorithm based on standard mahalanobis distances , 2009 .

[33]  I. Bertrand,et al.  Soil decomposition of wheat internodes of different maturity stages: relative impact of the soluble and structural fractions. , 2009, Bioresource technology.

[34]  K. Kadam,et al.  Fourier transform infrared quantitative analysis of sugars and lignin in pretreated softwood solid residues , 2001, Applied biochemistry and biotechnology.

[35]  A. Chesson Mechanistic Models of Forage Cell Wall Degradation , 1993 .

[36]  Donald Gustafson,et al.  Fuzzy clustering with a fuzzy covariance matrix , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[37]  Van Soest,et al.  Use of detergents in the analysis of fibrous feeds. 2. A rapid method for the determination of fiber and lignin. , 1963 .