Statistical techniques for analyzing of soil vapor intrusion data: A case study of manufactured gas plant sites

As part of an ongoing study of soil vapor intrusion (SVI), concentration data for approximately 2000 air and vapor samples were assembled from remedial site investigations and stand-alone assessments conducted at New York State Manufactured Gas Plant (MGP) sites. Vapor samples were collected from ambient outdoor air, indoor air, beneath building slabs, and from outside of buildings. Despite the large sample size, the considerable variability in compound and sample-specific censoring limits inhibited the use of conventional tools for statistical interpretation. This paper describes the development and application of improved statistical tools to address an unusually high degree of data censoring and possible artifacts related to uneven distributions of samples across sites and buildings. In addition to methods for calculating population percentiles and associated confidence intervals, methods for comparing the population of MGP-SVI data with a reference population were also developed and evaluated via illustrative comparisons with the published 2001 EPA Building Assessment Survey and Evaluation (BASE) study of industrial buildings. The focus of this work is on the development and evaluation of new statistical methods; a more complete summary and evaluation of the full NYS MGP-SVI data set will be presented in a companion paper. Implications: Data from vapor intrusion and other environmental studies are often stratified and/or censored, which complicates comparisons with background data or reference populations. In some cases, statistical methods for censored data can be modified to support population-based inference and reduce biases associated with the presence of repeated measurements from multiple sources. Such modifications are particularly appropriate for retrospective data mining studies that are not guided by a formal experimental design. Supplemental Materials: Supplemental materials are available for this paper. Go to the publisher's online edition of the Journal of the Air & Waste Management Association.

[1]  Chaofeng Liu,et al.  Adjusted Kaplan–Meier estimator and log‐rank test with inverse probability of treatment weighting for survival data , 2005, Statistics in medicine.

[2]  P. Sasieni,et al.  A weighted Kaplan–Meier estimator for matched data with application to the comparison of chemotherapy and bone‐marrow transplant in leukaemia , 2002, Statistics in medicine.

[3]  R. Latta,et al.  Generalized Wilcoxon statistics for the two-sample problem with censored data , 1977 .

[4]  Gabor Grothendieck,et al.  Lattice: Multivariate Data Visualization with R , 2008 .

[5]  P. Dixon Nondetects and Data Analysis: Statistics for Censored Environmental Data , 2006 .

[6]  Paul C. Johnson,et al.  Empirical Assessment of Ground Water–to‐Indoor Air Attenuation Factors for the CDOT‐MTL Denver Site , 2009 .

[7]  Per Capita,et al.  About the authors , 1995, Machine Vision and Applications.

[8]  D. Helsel More than obvious: better methods for interpreting nondetect data. , 2005, Environmental science & technology.

[9]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[10]  Bradley Efron,et al.  Censored Data and the Bootstrap , 1981 .

[11]  D. Harrington A class of rank test procedures for censored survival data , 1982 .

[12]  R. G. Miller,et al.  What price Kaplan-Meier? , 1983, Biometrics.

[13]  Todd McAlary,et al.  A Compilation of Statistics for VOCs from Post‐1990 Indoor Air Concentration Studies in North American Residences Unaffected by Subsurface Vapor Intrusion , 2009 .

[14]  G E Hadwen JR Girrnanl,et al.  INDIVIDUAL VOLATILE ORGANIC COMPOUND PREVALENCE AND CONCENTRATIONS IN 56 BUILDINGS OF THE BUILDING ASSESSMENT SURVEY AND EVALUATION ( BASE ) STUDY , 2022 .

[15]  Steven P. Millard,et al.  Nonparametric statistical methods for comparing two sites based on data with multiple nondetect limits , 1988 .