Multiblock analysis of environmental measurements: A case study of using Proton Induced X-ray Emission and meteorology dataset obtained from Islamabad Pakistan

Abstract This paper reports the analysis of a multiblock environmental dataset consisting of 176 samples collected in Islamabad Pakistan between February 2006 and August 2007. The concentrations of 32 elements in each sample were measured using Proton Induced X-ray Emission plus black carbon for both coarse and fine particulate matter. Six meteorological parameters were also recorded, namely maximum and minimum daily temperatures, humidity, rainfall, windspeed and pressure. The data were explored using Principal Components Analysis (PCA), Partial Least Squares (PLS), Consensus PCA, Multiblock PLS, Mantel test, Procrustes analysis and the R V coefficient. Seasonal trends can be identified and interpreted. Using the elemental composition of the particulates it is possible to predict meteorological parameters. Based on the models from PLS, it is possible to use elemental composition in the airborne particulates matter (APM) to predict meteorological parameters. The results from block similarity measures show that fine APM resembles meteorological parameters better than coarse APM. Multiblock PLS models however are not better than classical PLSR. This paper also demonstrates the potential of multiblock approach in environmental monitoring.

[1]  Erik Johansson,et al.  Megavariate Analysis of Environmental QSAR Data. Part II – Investigating Very Complex Problem Formulations Using Hierarchical, Non-Linear and Batch-Wise Extensions of PCA and PLS , 2006, Molecular Diversity.

[2]  L. E. Wangen,et al.  A theoretical foundation for the PLS algorithm , 1987 .

[3]  John L. Campbell,et al.  PIXE: A Novel Technique for Elemental Analysis , 1988 .

[4]  A. Jaworski,et al.  Application of Multiblock and Hierarchical PCA and PLS Models for Analysis of AC Voltammetric Data , 2005 .

[5]  J. Gower Generalized procrustes analysis , 1975 .

[6]  A. Hope A Simplified Monte Carlo Significance Test Procedure , 1968 .

[7]  S. de Jong,et al.  A framework for sequential multiblock component methods , 2003 .

[8]  José Manuel Andrade,et al.  Procrustes rotation in analytical chemistry, a tutorial , 2004 .

[9]  Sheldon Landsberger,et al.  Characterization of the Gent Stacked Filter Unit PM10 Sampler , 1997 .

[10]  P. Hopke,et al.  Multi-element Analysis and Characterization of Atmospheric Particulate Pollution in Dhaka , 2006 .

[11]  S. Wold,et al.  Partial Least Squares (PLS) in Cheminformatics , 2008 .

[12]  Theodora Kourti,et al.  Multivariate dynamic data modeling for analysis and statistical process control of batch processes, start‐ups and grade transitions , 2003 .

[13]  W G Kreyling,et al.  Sources and elemental composition of ambient PM(2.5) in three European cities. , 2005, The Science of the total environment.

[14]  A. Höskuldsson PLS regression methods , 1988 .

[15]  William J. Teesdale,et al.  The Guelph PIXE software package II , 1989 .

[16]  Yizeng Liang,et al.  Preprocessing of analytical profiles in the presence of homoscedastic or heteroscedastic noise , 1994 .

[17]  Maria E. Holmboe,et al.  Monte-Carlo methods for determining optimal number of significant variables. Application to mouse urinary profiles , 2009, Metabolomics.

[18]  J. Westerhuis,et al.  Multivariate modelling of the pharmaceutical two‐step process of wet granulation and tableting with multiblock partial least squares , 1997 .

[19]  Erik Johansson,et al.  Megavariate analysis of environmental QSAR data. Part I – A basic framework founded on principal component analysis (PCA), partial least squares (PLS), and statistical molecular design (SMD) , 2006, Molecular Diversity.

[20]  Mohd Suhaimi Hamzah,et al.  Urban air quality in the Asian region. , 2008, The Science of the total environment.

[21]  A. Markwitz,et al.  AIR PARTICULATE RESEARCH CAPABILITY AT THE NEW ZEALAND ION BEAM ANALYSIS FACILITY USING PIXE AND IBA TECHNIQUES , 2005 .

[22]  Age K. Smilde,et al.  Real-life metabolomics data analysis : how to deal with complex data ? , 2010 .

[23]  Achmad Hidayat,et al.  Sources identification of the atmospheric aerosol at urban and suburban sites in Indonesia by positive matrix factorization. , 2008, The Science of the total environment.

[24]  S. Wold,et al.  Multi‐way principal components‐and PLS‐analysis , 1987 .

[25]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[26]  José C. Menezes,et al.  Multiblock PLS as an approach to compare and combine NIR and MIR spectra in calibrations of soybean flour , 2005 .

[27]  A. Smilde,et al.  Deflation in multiblock PLS , 2001 .

[28]  D. Penn,et al.  Comparison of human axillary odour profiles obtained by gas chromatography/mass spectrometry and skin microbial profiles obtained by denaturing gradient gel electrophoresis using multivariate pattern recognition , 2007, Metabolomics.

[29]  Yu Song,et al.  Source apportionment of PM2.5 in Beijing using principal component analysis/absolute principal component scores and UNMIX. , 2006, The Science of the total environment.

[30]  John C. Gower,et al.  Better biplots , 2009, Comput. Stat. Data Anal..

[31]  Johnny Ferraz Dias,et al.  Elemental composition of PM10 and PM2.5 in urban environment in South Brazil , 2005 .

[32]  Milt Statheropoulos,et al.  Principal component and canonical correlation analysis for examining air pollution and meteorological data , 1998 .

[33]  A. Malik,et al.  Multi-Block Data Modeling for Characterization of Soil Contamination: A Case Study , 2007 .

[34]  P. Robert,et al.  A Unifying Tool for Linear Multivariate Statistical Methods: The RV‐Coefficient , 1976 .

[35]  Richard G. Brereton,et al.  Pattern Recognition of Gas Chromatography Mass Spectrometry of Human Volatiles in Sweat to distinguish the sex of subjects and determine potential Discriminatory Marker Peaks , 2007 .

[36]  Richard G. Brereton,et al.  Chemometrics for Pattern Recognition , 2009 .

[37]  Ketil Svinning,et al.  Modelling of multi‐block data , 2006 .

[38]  R. Brereton,et al.  Self-organizing map quality control index. , 2010, Analytical Chemistry.

[39]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[40]  Richard G. Brereton,et al.  Applied Chemometrics for Scientists , 2007 .

[41]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[42]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[43]  D. Cohen,et al.  Elemental analysis by PIXE and other IBA techniques and their application to source fingerprinting of atmospheric fine particle pollution , 1996 .

[44]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[45]  Barry Lennox,et al.  Monitoring a complex refining process using multivariate statistics , 2008 .

[46]  L. E. Wangen,et al.  A multiblock partial least squares algorithm for investigating complex chemical systems , 1989 .

[47]  Alberto Ferrer,et al.  Batch process diagnosis: PLS with variable selection versus block-wise PCR , 2004 .

[48]  J. Leathwick,et al.  A Procedure for Making Optimal Selection of Input Variables for Multivariate Environmental Classifications , 2007, Conservation biology : the journal of the Society for Conservation Biology.

[49]  Martin Andersson,et al.  A comparison of nine PLS1 algorithms , 2009 .

[50]  Desire L. Massart,et al.  Multiple factor analysis in environmental chemistry , 2005 .

[51]  D. V. Byrne,et al.  Selection of a subset of variables: minimisation of Procrustes loss between a subset and the full set , 2002 .

[52]  Roy M. Harrison,et al.  Size distributions of trace metals in atmospheric aerosols in the United Kingdom , 2001 .

[53]  V. E. Vinzi,et al.  PLS regression, PLS path modeling and generalized Procrustean analysis: a combined approach for multiblock analysis , 2005 .

[54]  J. Macgregor,et al.  Analysis of multiblock and hierarchical PCA and PLS models , 1998 .

[55]  Javier Andrade,et al.  Procrustes Rotation as a Way To Compare Different Sampling Seasons in Soils , 1995 .

[56]  A. Smilde,et al.  Multiblock PLS analysis of an industrial pharmaceutical process , 2002, Biotechnology and bioengineering.