Assessing positive matrix factorization model fit: a new method to estimate uncertainty and bias in factor contributions at the measurement time scale

Abstract. A Positive Matrix Factorization receptor model for aerosol pollution source apportionment was fit to a synthetic dataset simulating one year of daily measurements of ambient PM2.5 concentrations, comprised of 39 chemical species from nine pollutant sources. A novel method was developed to estimate model fit uncertainty and bias at the daily time scale, as related to factor contributions. A circular block bootstrap is used to create replicate datasets, with the same receptor model then fit to the data. Neural networks are trained to classify factors based upon chemical profiles, as opposed to correlating contribution time series, and this classification is used to align factor orderings across the model results associated with the replicate datasets. Factor contribution uncertainty is assessed from the distribution of results associated with each factor. Comparing modeled factors with input factors used to create the synthetic data assesses bias. The results indicate that variability in factor contribution estimates does not necessarily encompass model error: contribution estimates can have small associated variability across results yet also be very biased. These findings are likely dependent on characteristics of the data.

[1]  R. N. Wright,et al.  Oxygen , 1858, The American journal of dental science.

[2]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[3]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[4]  K. Singh,et al.  On the Asymptotic Accuracy of Efron's Bootstrap , 1981 .

[5]  E. Carlstein The Use of Subseries Values for Estimating the Variance of a General Statistic from a Stationary Sequence , 1986 .

[6]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[7]  Edna Schechtman,et al.  Efficient bootstrap simulation , 1986 .

[8]  N. Z. Heidam Bootstrap estimates of factor model variability , 1987 .

[9]  John R. Gleason,et al.  Algorithms for Balanced Bootstrap Simulations , 1988 .

[10]  Peter Hall,et al.  On efficient bootstrap simulation , 1989 .

[11]  H. Künsch The Jackknife and the Bootstrap for General Stationary Observations , 1989 .

[12]  Glen R. Cass,et al.  Sources of fine organic aerosol. 2. Noncatalyst and catalyst-equipped automobiles and heavy-duty diesel trucks , 1993 .

[13]  D. Dockery,et al.  An association between air pollution and mortality in six U.S. cities. , 1993, The New England journal of medicine.

[14]  Glen R. Cass,et al.  SOURCES OF FINE ORGANIC AEROSOL. 3. ROAD DUST, TIRE DEBRIS, AND ORGANOMETALLIC BRAKE LINING DUST: ROADS AS SOURCES AND SINKS , 1993 .

[15]  Glen R. Cass,et al.  Sources of fine organic aerosol. 4. Particulate abrasion products from leaf surfaces of urban plants , 1993 .

[16]  G. Cass,et al.  Sources of fine organic aerosol. 5. Natural gas home appliances , 1993 .

[17]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[18]  S. Konishi,et al.  A new approach based on a covariance structure model to source apportionment of indoor fine particles in Tokyo , 1994 .

[19]  Joseph P. Romano,et al.  The stationary bootstrap , 1994 .

[20]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[21]  P. Hall,et al.  On blocking rules for the bootstrap with dependent data , 1995 .

[22]  C. Chatfield Model uncertainty, data mining and statistical inference , 1995 .

[23]  P. Paatero Least squares formulation of robust non-negative factor analysis , 1997 .

[24]  Mutagenic particulate matter in air pollutant source emissions and in ambient air , 1997 .

[25]  J. Schauer Source contributions to atmospheric organic compound concentrations : emissions measurements and model predictions , 1998 .

[26]  P. Paatero,et al.  Atmospheric aerosol over Alaska: 2. Elemental composition and sources , 1998 .

[27]  J. Chow,et al.  Northern Front Range Air Quality Study Final Report Volume B: Source Measurements , 1998 .

[28]  C. S. Tong,et al.  Mass spectral search method using the neural network approach , 1999 .

[29]  C. S. Tong,et al.  Mass spectral search method using the neural network approach , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[30]  P. Paatero,et al.  Application of positive matrix factorization in source apportionment of particulate pollutants in Hong Kong , 1999 .

[31]  Michael J. Kleeman,et al.  Measurement of Emissions from Air Pollution Sources. 1. C1 through C29 Organic Compounds from Meat Charbroiling , 1999 .

[32]  Suilou Huang,et al.  Testing and optimizing two factor-analysis techniques on aerosol at Narragansett, Rhode Island , 1999 .

[33]  Ken Nelson,et al.  Light-Duty Motor Vehicle Exhaust Particulate Matter Measurement in the Denver, Colorado, Area. , 1999, Journal of the Air & Waste Management Association.

[34]  Upmanu Lall,et al.  A k‐nearest‐neighbor simulator for daily precipitation and other weather variables , 1999 .

[35]  Philip K. Hopke,et al.  Identification of Sources of Phoenix Aerosol by Positive Matrix Factorization , 2000, Journal of the Air & Waste Management Association.

[36]  S. N. Lahiri,et al.  Effects of block lengths on the validity of block resampling methods , 2001 .

[37]  William F. Christensen,et al.  Accounting for Dependence in a Flexible Multivariate Receptor Model , 2002, Technometrics.

[38]  Shelly L. Miller,et al.  Source apportionment of exposures to volatile organic compounds: II. Application of receptor models to TEAM study data , 2002 .

[39]  R. Burnett,et al.  Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution. , 2002, JAMA.

[40]  P. Paatero,et al.  Understanding and controlling rotations in factor analytic models , 2002 .

[41]  R. K. Larsen,et al.  Source apportionment of polycyclic aromatic hydrocarbons in the urban atmosphere: a comparison of three methods. , 2003, Environmental science & technology.

[42]  C. Lewis,et al.  Source Apportionment of Phoenix PM2.5 Aerosol with the Unmix Receptor Model , 2003, Journal of the Air & Waste Management Association.

[43]  P. T. Roberts,et al.  Weekday versus Weekend Activity Patterns for Ozone Precursor Emissions in California’s South Coast Air Basin , 2003, Journal of the Air & Waste Management Association.

[44]  H. White,et al.  Automatic Block-Length Selection for the Dependent Bootstrap , 2004 .

[45]  Alan David Hutson,et al.  Resampling Methods for Dependent Data , 2004, Technometrics.

[46]  G. Cass,et al.  Chemical Characterization of Fine Particle Emissions from the Fireplace Combustion of Wood Types Grown in the Midwestern and Western United States , 2004 .

[47]  Philip K. Hopke,et al.  A graphical diagnostic method for assessing the rotation in factor analytical models of atmospheric pollution , 2005 .

[48]  Francesca Dominici,et al.  Revised Analyses of the National Morbidity, Mortality, and Air Pollution Study: Mortality Among Residents Of 90 Cities , 2005, Journal of toxicology and environmental health. Part A.

[49]  H. Frumkin,et al.  Ambient Air Pollution and Respiratory Emergency Department Visits , 2005, Epidemiology.

[50]  Nicholas M. Kiefer,et al.  A NEW ASYMPTOTIC THEORY FOR HETEROSKEDASTICITY-AUTOCORRELATION ROBUST TESTS , 2005, Econometric Theory.

[51]  F. Meer The effectiveness of spectral similarity measures for the analysis of hyperspectral imagery , 2006 .

[52]  J. Milford,et al.  Use of synthetic data to evaluate positive matrix factorization as a source apportionment tool for PM2.5 exposure data. , 2006, Environmental science & technology.

[53]  L. Chen,et al.  Quantifying PM2.5 source contributions for the San Joaquin Valley with multivariate receptor models. , 2007, Environmental science & technology.

[54]  Philip K Hopke,et al.  Source Identifications of Airborne Fine Particles Using Positive Matrix Factorization and U.S. Environmental Protection Agency Positive Matrix Factorization , 2007, Journal of the Air & Waste Management Association.

[55]  Helmi Zulhaidi Mohd Shafri,et al.  The Performance of Maximum Likelihood, Spectral Angle Mapper, Neural Network and Decision Tree Classifiers in Hyperspectral Image Analysis , 2007 .

[56]  D. Olson,et al.  Chemical characterization of volatile organic compounds near the World Trade Center: Ambient concentrations and source apportionment , 2007 .

[57]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[58]  S. Vedal,et al.  The Denver Aerosol Sources and Health (DASH) Study: Overview and Early Findings. , 2009, Atmospheric environment.