Dynamic ensemble mechanisms to improve particulate matter forecasting

Abstract Respirable solid particles and liquid droplets suspended in the air, known as particulate matter (PM), may have a significant impact on human health, urban infrastructure, and natural and agricultural systems. The adverse effects of PM have raised public concern, especially in heavily polluted areas in the world, making it imperative the development of strategies to keep the concentration levels of these pollutants below harmful thresholds. Traditional machine learning approaches have been used to forecast PM concentrations. However, complex chemical processes may be involved in the composition of PM in the atmosphere and influenced by many meteorological parameters. Thus, underlying data distributions of PM data, uninterruptedly collected, may evolve over time. This phenomenon, known as concept drift, implies an important challenge for traditional machine learning techniques since they do not have mechanisms to handle changes on data distribution at the running time, thus limiting their forecasting capabilities. The overall goal of this work is to evaluate whether the incorporation of mechanisms to deal with concept drift, together with online sequential learning approaches, can improve the accuracy of PM forecasting. To do so, new mechanisms that enable online dynamic ensembles to handle and retain knowledge from different concepts for more time were proposed and adapted to EOS and DOER algorithms, resulting in three approaches: EOS-rank, EOS-D and DOER-rank. These ensemble strategies, which were based on Online Sequential Extreme Learning Machines (OS-ELM), were compared with five algorithms from the literature. To evaluate their performance, real-world and artificial datasets, with known dynamic behaviors, and PM concentration datasets from different cities of the State of Sao Paulo, Brazil, were used in the experiments. The obtained results showed that the proposed approaches can handle dynamic environments with different rates of drift and that EOS-rank was capable of outperforming most approaches from the literature in scenarios with higher rates of drift. The results also indicate that PM data distributions slowly evolve over time and, consequently, the proposed mechanisms that keep information of past concepts and slowly adapt the ensemble tend to present better results when applied to forecast PM concentration.

[1]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[2]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[3]  Daiwen Kang,et al.  Application of WRF/Chem-MADRID for real-time air quality forecasting over the Southeastern United States , 2011 .

[4]  L. Vinet,et al.  A ‘missing’ family of classical orthogonal polynomials , 2010, 1011.1669.

[5]  Marcus A. Maloof,et al.  Using additive expert ensembles to cope with concept drift , 2005, ICML.

[6]  Majid Salari,et al.  Statistical models for multi-step-ahead forecasting of fine particulate matter in urban areas , 2019, Atmospheric Pollution Research.

[7]  G. N. Pillai,et al.  Prediction of landslide displacement with controlling factors using extreme learning adaptive neuro-fuzzy inference system (ELANFIS) , 2017, Appl. Soft Comput..

[8]  Mark H Johnson,et al.  The development of spatial frequency biases in face recognition. , 2010, Journal of experimental child psychology.

[9]  G. M. Biju,et al.  Chaotic time series prediction using ELANFIS , 2017, 2017 6th International Conference on Computer Applications In Electrical Engineering-Recent Advances (CERA).

[10]  Abdullah Kadri,et al.  Urban Air Pollution Monitoring System With Forecasting Models , 2016, IEEE Sensors Journal.

[11]  Rui Araújo,et al.  A dynamic and on-line ensemble regression for changing environments , 2015, Expert Syst. Appl..

[12]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[13]  Rui Araújo,et al.  An on-line weighted ensemble of regressor models to handle concept drifts , 2015, Eng. Appl. Artif. Intell..

[14]  Saso Dzeroski,et al.  Learning model trees from evolving data streams , 2010, Data Mining and Knowledge Discovery.

[15]  Regression and multivariate models for predicting particulate matter concentration level , 2017, Environmental Science and Pollution Research.

[16]  Ana Estela Antunes da Silva,et al.  Using Ensembles of Artificial Neural Networks to Improve Pm10 Forecasts , 2015 .

[17]  Yu Zhang,et al.  Multi-kernel extreme learning machine for EEG classification in brain-computer interfaces , 2018, Expert Syst. Appl..

[18]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[19]  Amedeo D'Angiulli,et al.  Megacities air pollution problems: Mexico City Metropolitan Area critical issues on the central nervous system pediatric impact. , 2015, Environmental research.

[20]  Jaakko Astola,et al.  The class of generalized hampel filters , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[21]  Guilherme Palermo Coelho,et al.  Online Sequential Learning Based on Extreme Learning Machines for Particulate Matter Forecasting , 2017, 2017 Brazilian Conference on Intelligent Systems (BRACIS).

[22]  Zhijie Zhu,et al.  Research and application of a novel hybrid air quality early-warning system: A case study in China. , 2018, The Science of the total environment.

[23]  João Gama,et al.  Adaptive Model Rules From High-Speed Data Streams , 2014, BigMine.

[24]  Xiao Feng,et al.  Prediction of hourly ground-level PM2.5 concentrations 3 days in advance using neural networks with satellite data in eastern China , 2017 .

[25]  Enrico Zio,et al.  An adaptive online learning approach for Support Vector Regression: Online-SVR-FID , 2016 .

[26]  On the development of an intelligent system for particulate matter air pollution monitoring, analysis and forecasting in urban regions , 2015, 2015 19th International Conference on System Theory, Control and Computing (ICSTCC).

[27]  Adriano Lorena Inácio de Oliveira,et al.  An approach to handle concept drift in financial time series based on Extreme Learning Machines and explicit Drift Detection , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[28]  Narasimhan Sundararajan,et al.  A Fast and Accurate Online Sequential Learning Algorithm for Feedforward Networks , 2006, IEEE Transactions on Neural Networks.

[29]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[30]  F. Dominici,et al.  Time-series studies of particulate matter. , 2004, Annual review of public health.

[31]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[32]  Eros Pasero,et al.  Data-driven models to forecast PM10 concentration , 2007, 2007 International Joint Conference on Neural Networks.

[33]  P. Khillare,et al.  Atmospheric Particulate Matter Variations and Comparison of Two Forecasting Models for Two Indian Megacities , 2019, Aerosol Science and Engineering.