Machine learning for observation bias correction with application to dust storm data assimilation

Abstract. Data assimilation algorithms rely on a basic assumption of an unbiased observation error. However, the presence of inconsistent measurements with nontrivial biases or inseparable baselines is unavoidable in practice. Assimilation analysis might diverge from reality since the data assimilation itself cannot distinguish whether the differences between model simulations and observations are due to the biased observations or model deficiencies. Unfortunately, modeling of observation biases or baselines which show strong spatiotemporal variability is a challenging task. In this study, we report how data-driven machine learning can be used to perform observation bias correction for data assimilation through a real application, which is the dust emission inversion using PM10 observations. PM10 observations are considered unbiased; however, a bias correction is necessary if they are used as a proxy for dust during dust storms since they actually represent a sum of dust particles and non-dust aerosols. Two observation bias correction methods have been designed in order to use PM10 measurements as proxy for the dust storm loads under severe dust conditions. The first one is the conventional chemistry transport model (CTM) that simulates life cycles of non-dust aerosols. The other one is the machine-learning model that describes the relations between the regular PM10 and other air quality measurements. The latter is trained by learning using 2 years of historical samples. The machine-learning-based non-dust model is shown to be in better agreement with observations compared to the CTM. The dust emission inversion tests have been performed, through assimilating either the raw measurements or the bias-corrected dust observations using either the CTM or machine-learning model. The emission field, surface dust concentration, and forecast skill are evaluated. The worst case is when we directly assimilate the original observations. The forecasts driven by the a posteriori emission in this case even result in larger errors than the reference prediction. This shows the necessities of bias correction in data assimilation. The best results are obtained when using the machine-learning model for bias correction, with the existing measurements used more precisely and the resulting forecasts close to reality.

[1]  J. Reid,et al.  Ensemble filter based estimation of spatially distributed parameters in a mesoscale dust model: experiments with simulated and real data , 2012 .

[2]  K. Lehtinen,et al.  Comparing ECMWF AOD with AERONET observations at visible and UV wavelengths , 2013 .

[3]  H. Jaap van den Herik,et al.  Air Quality Forecast through Integrated Data Assimilation and Machine Learning , 2019, ICAART.

[4]  E. Vermote,et al.  The MODIS Aerosol Algorithm, Products, and Validation , 2005 .

[5]  D. P. DEE,et al.  Bias and data assimilation , 2005 .

[6]  L. Knibbs,et al.  A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information. , 2018, The Science of the total environment.

[7]  H. Murakami,et al.  Forecasting of Asian dust storm that occurred on May 10–13, 2011, using an ensemble-based data assimilation system☆ , 2016 .

[8]  Takemasa Miyoshi,et al.  Data assimilation of CALIPSO aerosol observations , 2009 .

[9]  J. Hacker,et al.  Observation and Model Bias Estimation in the Presence of Either or Both Sources of Error , 2017 .

[10]  Qiang Zhang,et al.  Source apportionment of PM2.5 across China using LOTOS-EUROS , 2017 .

[11]  Michael Schulz,et al.  Global dust model intercomparison in AeroCom phase I , 2011 .

[12]  Qi Li,et al.  A Spatiotemporal Prediction Framework for Air Pollution Based on Deep RNN , 2017 .

[13]  Zifa Wang,et al.  A deflation module for use in modeling long‐range transport of yellow sand over East Asia , 2000 .

[14]  Xiang Li,et al.  Deep learning architecture for air quality predictions , 2016, Environmental Science and Pollution Research.

[15]  Nobuo Sugimoto,et al.  A high-resolution numerical study of the Asian dust storms of April 2001 , 2003 .

[16]  D. Dee,et al.  Variational bias correction of satellite radiance data in the ERA‐Interim reanalysis , 2009 .

[17]  Yong Wang,et al.  Characterizing remarkable changes of severe haze events and chemical compositions in multi-size airborne particles (PM1, PM2.5 and PM10) from January 2013 to 2016–2017 winter in Beijing, China , 2018, Atmospheric Environment.

[18]  A. Segers,et al.  Dust Emission Inversion Using Himawari‐8 AODs Over East Asia: An Extreme Dust Event in May 2017 , 2019, Journal of Advances in Modeling Earth Systems.

[19]  Shichao Zhang,et al.  The Journal of Systems and Software , 2012 .

[20]  A. Segers,et al.  Ensemble forecasts of air quality in eastern China – Part 2: Evaluation of the MarcoPolo–Panda prediction system, version 1 , 2019, Geoscientific Model Development.

[21]  J. R. Eyre,et al.  Observation bias correction schemes in data assimilation systems: a theoretical study of some of their properties , 2016 .

[22]  Y. Q. Wang,et al.  Data assimilation of dust aerosol observations for the CUACE/dust forecasting system , 2007 .

[23]  Guy P. Brasseur,et al.  Ensemble forecasts of air quality in eastern China – Part 1: Model description and implementation of the MarcoPolo–Panda prediction system, version 1 , 2019, Geoscientific Model Development.

[24]  A. Benedetti,et al.  The value of satellite observations in the analysis and short-range prediction of Asian dust , 2019, Atmospheric Chemistry and Physics.

[25]  B. Marticorena,et al.  Modeling the atmospheric dust cycle: 1. Design of a soil-derived dust emission scheme , 1995 .

[26]  Caiyan Lin,et al.  An Ensemble Kalman Filter for severe dust storm data assimilation over China , 2008 .

[27]  M. Chin,et al.  Sources and distributions of dust aerosols simulated with the GOCART model , 2001 .

[28]  Xiang Li,et al.  Long short-term memory neural network for air pollutant concentration predictions: Method development and evaluation. , 2017, Environmental pollution.

[29]  Sunling Gong,et al.  Surface observation of sand and dust storm in East Asia and its application in CUACE/Dust , 2007 .

[30]  Hiroshi Murakami,et al.  Common Retrieval of Aerosol Properties for Imaging Satellite Sensors , 2018 .

[31]  Michael Schulz,et al.  Will a perfect model agree with perfect observations? The impact of spatial sampling , 2016 .

[32]  Zhaoyan Liu,et al.  Adjoint inversion modeling of Asian dust emission using lidar observations , 2008 .

[33]  O. Jorba,et al.  Assimilation of MODIS Dark Target and Deep Blue Observations in the Dust Aerosol Component of NMMB-MONARCH version 1.0 , 2016 .

[34]  Arnold Heemink,et al.  Spatially varying parameter estimation for dust emissions using reduced-tangent-linearization 4DVar , 2018, Atmospheric Environment.

[35]  P. Palmer,et al.  Estimates of global terrestrial isoprene emissions using MEGAN (Model of Emissions of Gases and Aerosols from Nature) , 2006 .

[36]  Ian G. McKendry,et al.  Characterization of soil dust aerosol in China and its transport and distribution during 2001 ACE-Asia: 2. Model simulation and validation , 2003 .

[37]  X. Tie,et al.  Widespread and persistent ozone pollution in eastern China during the non-winter season of 2015: observations and source attributions , 2017 .

[38]  John Harlim,et al.  Correcting Biased Observation Model Error in Data Assimilation , 2016, 1611.05405.

[39]  S. Batterman,et al.  Spatiotemporal characteristics of PM2.5 and PM10 at urban and corresponding background sites in 23 cities in China. , 2017, The Science of the total environment.

[40]  Sunling Gong,et al.  CUACE/Dust – an integrated system of observation and modeling systems for operational dust forecasting in Asia , 2007 .

[41]  M. Razinger,et al.  Biomass burning emissions estimated with a global fire assimilation system based on observed fire radiative power , 2011 .

[42]  Raupach,et al.  A model for predicting aeolian sand drift and dust entrainment on scales from paddock to region , 1996 .