Predicting organic acid concentration from UV/vis spectrometry measurements – a comparison of machine learning techniques

The concentration of organic acids in anaerobic digesters is one of the most critical parameters for monitoring and advanced control of anaerobic digestion processes. Thus, a reliable online-measurement system is absolutely necessary. A novel approach to obtaining these measurements indirectly and online using UV/vis spectroscopic probes, in conjunction with powerful pattern recognition methods, is presented in this paper. An UV/vis spectroscopic probe from S::CAN is used in combination with a custom-built dilution system to monitor the absorption of fully fermented sludge at a spectrum from 200 to 750 nm. Advanced pattern recognition methods are then used to map the non-linear relationship between measured absorption spectra to laboratory measurements of organic acid concentrations. Linear discriminant analysis, generalized discriminant analysis (GerDA), support vector machines (SVM), relevance vector machines, random forest and neural networks are investigated for this purpose and their performance compared. To validate the approach, online measurements have been taken at a full-scale 1.3-MW industrial biogas plant. Results show that whereas some of the methods considered do not yield satisfactory results, accurate prediction of organic acid concentration ranges can be obtained with both GerDA and SVM-based classifiers, with classification rates in excess of 87% achieved on test data.

[1]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[2]  Moustafa M. Fahmy,et al.  On the Discriminatory Power of Adaptive Feed-Forward Layered Networks , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[4]  George W. Irwin,et al.  A hybrid linear/nonlinear training algorithm for feedforward neural networks , 1998, IEEE Trans. Neural Networks.

[5]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[6]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[7]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[8]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[9]  A. Puñal,et al.  Automatic control of VFA in anaerobic digestion using a fuzzy logic based approach , 2002 .

[10]  J Harmand,et al.  On-line measurements of COD, TOC, VFA, total and partial alkalinity in anaerobic digestion processes using infra-red ectrometry. , 2002, Water science and technology : a journal of the International Association on Water Pollution Research.

[11]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[12]  N. Fleischmann,et al.  A multivariate calibration procedure for UV/VIS spectrometric quantification of organic matter and nitrate in wastewater. , 2003, Water science and technology : a journal of the International Association on Water Pollution Research.

[13]  J P Steyer,et al.  Automatic control of volatile fatty acids in anaerobic digestion using a fuzzy logic based approach. , 2003, Water science and technology : a journal of the International Association on Water Pollution Research.

[14]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[17]  Rudolf Braun,et al.  Prediction of trace compounds in biogas from anaerobic digestion using the MATLAB Neural Network Toolbox , 2005, Environ. Model. Softw..

[18]  A. Aivasidis,et al.  Continuous determination of volatile products in anaerobic fermenters by on-line capillary gas chromatography. , 2006, Analytica chimica acta.

[19]  M. Clerc,et al.  Particle Swarm Optimization , 2006 .

[20]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[21]  Ahmet Demir,et al.  Neural network prediction model for the methane fraction in biogas from field-scale landfill bioreactors , 2007, Environ. Model. Softw..

[22]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[23]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[24]  Guillaume L. Erny,et al.  Quantification of organic acids in beer by nuclear magnetic resonance (NMR)-based methods. , 2010, Analytica chimica acta.

[25]  André Stuhlsatz,et al.  Discriminative feature extraction with Deep Neural Networks , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[26]  G. Huang,et al.  Microbial-growth inhibition during composting of food waste: effects of organic acids. , 2010, Bioresource technology.

[27]  R. Ferreira,et al.  Protein haze formation in wines revisited. The stabilising effect of organic acids , 2010 .

[28]  Ho-Sub Yoon,et al.  A Deconvolutive Neural Network for Speech Classification With Applications to Home Service Robot , 2010, IEEE Transactions on Instrumentation and Measurement.

[29]  M. Archana,et al.  Human Behavior Classification Using Multi-Class Relevance Vector Machine , 2010 .

[30]  Bruno Coulomb,et al.  On-line analysis of volatile fatty acids in anaerobic treatment processes. , 2010, Analytica chimica acta.

[31]  André Stuhlsatz,et al.  Feature Extraction for Simple Classification , 2010, 2010 20th International Conference on Pattern Recognition.

[32]  Bernardete Ribeiro,et al.  Inductive Inference for Large Scale Text Classification: Kernel Approaches and Techniques , 2010, Studies in Computational Intelligence.

[33]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[34]  Samia Boukir,et al.  Relevance of airborne lidar and multispectral image data for urban scene classification using Random Forests , 2011 .

[35]  Roman M. Balabin,et al.  Near-infrared (NIR) spectroscopy for motor oil classification: From discriminant analysis to support vector machines , 2011 .