Prediction of High-ozone Events Using GAM, SMOTE, and Tail Dependence Approaches in Texas (2005-2019)

ABSTRACT We test two methods for ozone prediction in the El Paso (ELP) and Houston-Galveston-Brazoria (HGB) regions of Texas from 2005–2019: (1) a Generalized Additive Model (GAMs) with the Synthetic Minority Over-sampling TEchnique (SMOTE) and (2) a tail dependence regression approach. We also compare the feature selection capabilities of the tail dependence approach to other feature selection methods. We find that, generally, the GAM+SMOTE model outperformed the GAM-only model when predicting ozone values, particularly with regard to the above-threshold ozone values. We also find that the tail dependence approach is capable of predicting extreme ozone events, but algorithmic stability and configuration complexity can make this approach difficult to operationalize on a broad scale and that the selection of the threshold needs to be carefully considered. In addition, we find that the improvement of above-threshold MDA8 O3 prediction tends to come at the cost of below-threshold prediction, which is particularly important if MDA8 O3 trends are of interest. Finally, the feature selection via the tail dependence method performs comparably to other forms of machine learning-based feature selection and we find that there are multiple parameter sets that can predict MDA8 O3 with equal success.

[1]  J. Lamarque,et al.  Changes in the frequency and return level of high ozone pollution events over the eastern United States following emission controls , 2013 .

[2]  Colm Sweeney,et al.  Long-term ozone trends at rural ozone monitoring sites across the United States, 1990-2010 , 2012 .

[3]  S. Tilmes,et al.  Extremal dependence between temperature and ozone over the continental US , 2017, Atmospheric Chemistry and Physics.

[4]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[5]  He Xu,et al.  Spatiotemporal ozone pollution LUR models: Suitable statistical algorithms and time scales for a megacity scale , 2020 .

[6]  Paul L. Speckman,et al.  A model for predicting maximum and 8 h average ozone in Houston , 1999 .

[7]  Donatello Telesca,et al.  Machine learning models accurately predict ozone exposure during wildfire events. , 2019, Environmental pollution.

[8]  Virgilio Gómez-Rubio,et al.  Generalized Additive Models: An Introduction with R (2nd Edition) , 2018 .

[9]  Keywan Riahi,et al.  Evolution of anthropogenic and biomass burning emissions of air pollutants at global and regional scales during the 1980–2010 period , 2011 .

[10]  P. Hess,et al.  Asian influence on surface ozone in the United States: A comparison of chemistry, seasonality, and transport mechanisms , 2011 .

[11]  José Emilio Meroño de Larriva,et al.  Machine Learning Methods and Synthetic Data Generation to Predict Large Wildfires , 2021, Sensors.

[12]  U. Nair,et al.  Quantifying O3 Impacts in Urban Areas Due to Wildfires Using a Generalized Additive Model. , 2017, Environmental science & technology.

[13]  M. Dhore,et al.  Machine Learning Models , 2020, Machine Learning for Speaker Recognition.

[14]  A. Quintela-del-Río,et al.  Nonparametric functional data estimation applied to ozone data: prediction and extreme value analysis. , 2011, Chemosphere.

[15]  A. Russell,et al.  Scientific assessment of background ozone over the U.S.: Implications for air quality management. , 2018, Elementa.

[16]  S. Tilmes,et al.  Maximizing ozone signals among chemical, meteorological, and climatological variability , 2017, Atmospheric Chemistry and Physics.

[17]  Lise Bellanger,et al.  Forecasting Daily of Surface Ozone Concentration in the Grand Casablanca Region Using Parametric and Nonparametric Statistical Models , 2021 .

[18]  C. Archer,et al.  The importance of transport to ozone pollution in the U.S. Mid-Atlantic , 2018, Atmospheric Environment.

[19]  R. Pierce,et al.  Entrainment of stratospheric air and Asian pollution by the convective boundary layer in the southwestern U.S. , 2017 .

[20]  L. Mickley,et al.  Seasonal prediction of US summertime ozone using statistical analysis of large scale climate patterns , 2017, Proceedings of the National Academy of Sciences.

[21]  Joseph P. Pinto,et al.  Tropospheric Ozone Assessment Report : Present-day ozone distribution and trends relevant to human health , 2018 .

[22]  Brook T. Russell,et al.  Observed and predicted sensitivities of extreme surface ozone to meteorological drivers in three US cities , 2018 .

[23]  M. Green Air pollution and health , 1995 .

[24]  Brook T. Russell,et al.  Data Mining to Investigate the Meteorological Drivers for Extreme Ground Level Ozone Events , 2015, 1504.08080.