A Bayesian learning and data mining approach to reaction system identification: Application to biomass conversion

The growing environmental concern over the use of fossil fuels calls for alternative sources of energy with smaller environmental footprint, and biomass-derived fuels have been extensively investigated as a substitute. In biofuels production, the development of reaction networks and kinetic models is unquestionably a major challenge due to the difficulty in characterizing the reaction products. Therefore, there is a need for a better way to retrieve the information about the reaction from the available experimental data. This study uses a data mining and Bayesian learning approach to estimate the reaction network of the acid and base catalyzed hydrous pyrolysis of hemicellulose from Fourier Transform Infrared (FTIR) spectroscopy. Cluster analysis is used to model the system in terms of lumps and a Bayesian network structure-learning algorithm is then used to device a reaction network. Three Bayesian network structure-learning algorithms were implemented to estimate the reaction network. The results from each were identical, indicating that the model representing the reaction network is most probably in the optimal equivalence space. The model was compared against expert-based reaction models and the agreement is encouraging. A useful aspect of this model is its self-updating capability, i.e., the reaction model can provide a quantitative description of the effect of the change in the operation condition from spectroscopic data. Hence, the model may be used for the real time analysis of the investigated process.

[1]  Ronald T. Raines,et al.  Synthesis of furfural from xylose and xylan. , 2010, ChemSusChem.

[2]  Sirish L. Shah,et al.  Data-based causality detection from a system identification perspective , 2013, 2013 European Control Conference (ECC).

[3]  J. Marron,et al.  The high-dimension, low-sample-size geometric representation holds under mild conditions , 2007 .

[4]  George Sugihara,et al.  Detecting Causality in Complex Ecosystems , 2012, Science.

[5]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[6]  M. Vannucci,et al.  Bayesian Variable Selection in Clustering High-Dimensional Data , 2005 .

[7]  Nir Friedman,et al.  Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning , 2009 .

[8]  Jianfeng Feng,et al.  Granger causality vs. dynamic Bayesian network inference: a comparative study , 2009, BMC Bioinformatics.

[9]  Rema Padman,et al.  Tabu Search Enhanced Markov Blanket Classifier for High Dimensional Data Sets , 2005 .

[10]  Anthony C. Davison,et al.  High-Dimensional Bayesian Clustering with Variable Selection: The R Package bclust , 2012 .

[11]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[12]  R. Silverstein,et al.  Spectrometric identification of organic compounds , 2013 .

[13]  M. Vannucci,et al.  Bayesian variable selection in clustering high-dimensional data with substructure , 2008 .

[14]  Daphne Koller Structured Probabilistic Models: Bayesian Networks and Beyond , 1998, AAAI/IAAI.

[15]  Vipin Kumar,et al.  The Challenges of Clustering High Dimensional Data , 2004 .

[16]  D. Stephens,et al.  A Quantitative Study of Gene Regulation Involved in the Immune Response of Anopheline Mosquitoes , 2006 .

[17]  Sun-Mi Lee,et al.  Bayesian networks for knowledge discovery in large datasets: basics for nurse researchers , 2003, J. Biomed. Informatics.

[18]  B. Selman,et al.  Hill‐climbing Search , 2006 .