Data-oriented research for bioresource utilization: A case study to investigate water uptake in cellulose using Principal Components

Bioresource utilization represents an important interdisciplinary research that integrates academic and industrial expertise across diverse scientific domains, including physics, chemistry, biology, and engineering. The present paper describes a cyber-infrastructure being created at the Brazilian Bioethanol Science and Technology Laboratory (CTBE) to assist scientists working on the field. One key element of the infrastructure is the LignoCel Platform, a tailor-made database for upload, curation, and sharing of lignocellulose data. Particularly, LignoCel allows querying the data and exporting subsets that are analyzed for knowledge extraction. In the present paper, a case-study is described, in which scientists want to investigate the dimensions that relate cellulose structure and water uptake. Data analysis and dimensionality reduction using Principal Component Analysis (PCA) is employed. Different PCA-based measurements are extracted and visualized through automatically-generated HTML pages available for the domain scientists. In this case study, the workflow successfully provided dimensionality reduction from a data matrix originated from a heterogeneous set of materials. PCA scores and loadings are explored for data analysis and visualization. PCA reduced the 11 measured features (obtained from three different experimental techniques, 55 possible combinations of size 2) into a two-dimensional PC1PC2 loadings plot representing 89% of data variance. Examples of the output produced by the system are available at http://data.bioetanol.org. br/~liu.ling/pca-lignocel/.

[1]  Arnaldo Walter,et al.  Sugarcane as an energy source , 2013 .

[2]  Carl Lagoze,et al.  A Semantic eScience Platform for Chemistry , 2010, 2010 IEEE Sixth International Conference on e-Science.

[3]  Shasha Li,et al.  Application of Data Mining in Research of Avian Influenza Virus Cross-Species Infection , 2011, 2011 IEEE Seventh International Conference on eScience.

[4]  C. Driemeier,et al.  Dynamic vapor sorption and thermoporometry to probe water in celluloses , 2012, Cellulose.

[5]  G. Calligaris,et al.  Theoretical and experimental developments for accurate determination of crystallinity of cellulose I materials , 2011 .

[6]  A. J. Hailwood,et al.  Absorption of water by polymers: analysis in terms of a simple model , 1946 .

[7]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[8]  Steven D. Brown Introduction to Multivariate Statistical Analysis in Chemometrics , 2010 .

[9]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[10]  Andrès Márquez,et al.  Fault Detection in Distributed Climate Sensor Networks Using Dynamic Bayesian Networks , 2010, 2010 IEEE Sixth International Conference on e-Science.

[11]  L. Salmén,et al.  Pore and matrix distribution in the fiber wall revealed by atomic force microscopy and image analysis. , 2005, Biomacromolecules.

[12]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[13]  M. Galbe,et al.  Bio-ethanol--the fuel of tomorrow from the residues of today. , 2006, Trends in biotechnology.

[14]  David Abramson,et al.  Virtual Microscopy and Analysis Using Scientific Workflows , 2009, 2009 Fifth IEEE International Conference on e-Science.

[15]  Harold Hotelling,et al.  Simplified calculation of principal components , 1936 .

[16]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[17]  Ewa Deelman,et al.  A Cloud-based Dynamic Workflow for Mass Spectrometry Data Analysis , 2011, 2011 IEEE Seventh International Conference on eScience.

[18]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[19]  Chris Somerville,et al.  Feedstocks for Lignocellulosic Biofuels , 2010, Science.

[20]  Manjula Patel,et al.  The Role of OAIS Representation Information in the Digital Curation of Crystallography Data , 2009, 2009 Fifth IEEE International Conference on e-Science.