Application of k-means and hierarchical clustering techniques for analysis of air pollution: A review (1980–2019)

Abstract Clustering is an explorative data analysis technique used for investigating the underlying structure in the data. It described as the grouping of objects, where the objects share similar characteristics. Over the past 50 years, clustering has been widely applied to atmospheric science data in particular, climate and meteorological data. Since the 1980's, air pollution studies began employing clustering techniques, and has since been successful, and the aim of this paper is to provide a review of such studies. In particular, two well known and commonly used clustering methods i.e. k-means and hierarchical agglomerative, that have been applied in air pollution studies have been reviewed. Air pollution data from two sources i.e. ground-based monitoring stations and air mass trajectories depicting pollutant pathways, have been included. Research works that have focused on spatio-temporal characteristics of air pollutants, pollutant behavior in terms of source, transport pathways, apportionment and links to meteorological conditions, comprise much of the research works reviewed. A total of 100 research articles were included during the period of 1980–2019. The purpose of the clustering approach, the specific technique used and the data to which it was applied constitute much of the discussion presented in this review. Overall, the k-means technique has been extensively used among the studies, while average and Ward linkages were the most frequently applied hierarchical clustering techniques. Reviews of clustering techniques applied in air pollution studies are currently lacking and this paper aims to fill that gap. In addition, and to the best of the authors' knowledge, this is the first review dedicated to clustering applications in air pollution studies, and the first that covers the longest time span (1980–2019).

[1]  P. Kassomenos,et al.  An overview of the PM10 pollution problem, in the Metropolitan Area of Athens, Greece. Assessment of controlling factors and potential impact of long range transport. , 2008, The Science of the total environment.

[2]  Michalis Vazirgiannis,et al.  On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[3]  R. Burnett,et al.  Ozone exposure and cardiovascular-related mortality in the Canadian Census Health and Environment Cohort (CANCHEC) by spatial synoptic classification zone. , 2016, Environmental pollution.

[4]  É. Lavigne,et al.  Associations between long-term PM2.5 and ozone exposure and mortality in the Canadian Census Health and Environment Cohort (CANCHEC), by spatial synoptic classification zone. , 2018, Environment international.

[5]  G H Ball,et al.  A clustering technique for summarizing multivariate data. , 1967, Behavioral science.

[6]  Daniel A. Jaffe,et al.  Atmospheric transport pathways from the Bilibino nuclear power plant to Alaska , 1999 .

[7]  Nguyen Thi Kim Oanh,et al.  Assessment of potential long-range transport of particulate air pollution using trajectory modeling and monitoring data , 2007 .

[8]  Andrew C. Comrie,et al.  An All-Season Synoptic Climatology of Air Pollution in the U.S.-Mexico Border Region* , 1996 .

[9]  Michael B. Richman,et al.  On the Application of Cluster Analysis to Growing Season Precipitation Data in North America East of the Rockies , 1995 .

[10]  Qing Yang,et al.  Modeling the effects of meteorology on ozone in Houston using cluster analysis and generalized additive models , 1998 .

[11]  Stphane Tuffry,et al.  Data Mining and Statistics for Decision Making , 2011 .

[12]  Yuqi Bai,et al.  Characterizations of PM2.5 Pollution Pathways and Sources Analysis in Four Large Cities in China , 2015 .

[13]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[14]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[15]  Tijian Wang,et al.  Synoptic weather patterns and their impacts on regional particle pollution in the city cluster of the Sichuan Basin, China , 2019, Atmospheric Environment.

[16]  Mohd Talib Latif,et al.  Spatial Assessment of Air Quality Patterns in Malaysia Using Multivariate Analysis , 2012 .

[17]  Ian T. Jolliffe,et al.  Some recent developments in cluster analysis , 2010 .

[18]  Bhanu Pandey,et al.  Assessment of air pollution around coal mining area: Emphasizing on spatial distributions, seasonal variations and heavy metals, using cluster and principal component analysis , 2014 .

[19]  Jerry M. Davis,et al.  An Automated Classification Scheme Designed to Better Elucidate the Dependence of Ozone on Meteorology , 1994 .

[20]  J. Laul,et al.  Background air particulate chemistry near Colstrip, Montana. , 1980, Environmental science & technology.

[21]  S. Incecik,et al.  Influence of meteorological factors and emission sources on spatial and temporal variations of PM10 concentrations in Istanbul metropolitan area , 2011 .

[22]  Shaocai Yu,et al.  Origin of air pollution during a weekly heavy haze episode in Hangzhou, China , 2014, Environmental Chemistry Letters.

[23]  Trevor D. Davies,et al.  Cluster analysis: A technique for estimating the synoptic meteorological controls on air and precipitation chemistry—Method and applications , 1992 .

[24]  Qingzhe Zhu,et al.  Distribution, source and transport of the aerosols over Central Asia , 2019, Atmospheric Environment.

[25]  H. Kipen,et al.  Respiratory health effects of air pollution: update on biomass smoke and traffic pollution. , 2012, The Journal of allergy and clinical immunology.

[26]  Jianbing Li,et al.  Identification of regional atmospheric PM10 transport pathways using HYSPLIT, MM5-CMAQ and synoptic pressure pattern analysis , 2010, Environ. Model. Softw..

[27]  P. Samson,et al.  The influence of atmospheric transport on precipitation chemistry at two sites in the midwestern United States , 1989 .

[28]  Jianjun He,et al.  Annual and diurnal variations of gaseous and particulate pollutants in 31 provincial capital cities based on in situ air quality monitoring data from China National Environmental Monitoring Center. , 2016, Environment international.

[29]  R. J. Yamartino,et al.  A new air quality regime classification scheme for O3, NO2, SO2 and PM10 observations sites , 2005 .

[30]  L. Kalkstein,et al.  USING A SPATIAL SYNOPTIC CLIMATOLOGICAL CLASSIFICATION TO ASSESS CHANGES IN ATMOSPHERIC POLLUTION CONCENTRATIONS , 1990 .

[31]  Shou-biao Zhou,et al.  Temporal characteristic and source analysis of PM2.5 in the most polluted city agglomeration of China , 2018, Atmospheric Pollution Research.

[32]  R. Harrison,et al.  The use of trajectory cluster analysis to examine the long-range transport of secondary inorganic aerosol in the UK , 2005 .

[33]  Chung-Liang Chang,et al.  Classification of PM10 distributions in Taiwan , 2006 .

[34]  Heekwan Lee,et al.  Implication of the cluster analysis using greenhouse gas emissions of Asian countries to climate change mitigation , 2018, Mitigation and Adaptation Strategies for Global Change.

[35]  A. Palazoglu,et al.  Cluster Analysis of Hourly Wind Measurements to Reveal Synoptic Regimes Affecting Air Quality , 2006 .

[36]  B. Brunekreef,et al.  The effect of industry-related air pollution on lung function and respiratory symptoms in school children , 2018, Environmental Health.

[37]  Daniel A. Jaffe,et al.  Analysis of rainfall and fine aerosol data using clustered trajectory analysis for National Park sites in the Western US , 2007 .

[38]  E. Gorham,et al.  Acid Rain: Ionic Correlations in the Eastern United States, 1980-1981 , 1984, Science.

[39]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[40]  J. Baker,et al.  A cluster analysis of long range air transport pathways and associated pollutant concentrations within the UK , 2010 .

[41]  Julio Lumbreras,et al.  Analysis of long-range transport influences on urban PM10 using two-stage atmospheric trajectory clusters , 2007 .

[42]  Hengchun Ye,et al.  Relationships between Synoptic Climatology and Atmospheric Pollution at 4 US Cities , 1999 .

[43]  Brian Everitt,et al.  Cluster analysis , 1974 .

[44]  Efisio Solazzo,et al.  Comparing apples with apples: Using spatially distributed time series of monitoring data for model evaluation , 2015 .

[45]  D. Qin,et al.  Analyses of regional pollution and transportation of PM2.5 and ozone in the city clusters of Sichuan Basin, China , 2019, Atmospheric Pollution Research.

[46]  Xin Huang,et al.  Analysis of the transport pathways and potential sources of PM10 in Shanghai based on three methods. , 2012, The Science of the total environment.

[47]  R. Harrison,et al.  Cluster analysis of rural, urban, and curbside atmospheric particle size data. , 2009, Environmental Science and Technology.

[48]  M Namratha,et al.  A Comprehensive Overview of Clustering Algorithms in Pattern Recognition , 2012 .

[49]  M. L. Sanchez Gomez,et al.  Application of cluster analysis to identify sources of airborne particles , 1987 .

[50]  Chia-Hua Hsu,et al.  Classification of weather patterns to study the influence of meteorological characteristics on PM2.5 concentrations in Yunlin County, Taiwan , 2016 .

[51]  A. Charron,et al.  Identification of sources of atmospheric particulate matter and trace metals in Constantine, Algeria , 2016, Air Quality, Atmosphere & Health.

[52]  J. Kahl,et al.  A descriptive atmospheric transport climatology for the Mauna Loa Observatory, using clustered trajectories , 1990 .

[53]  L. Alados-Arboledas,et al.  Classification of aerosol radiative properties during African desert dust intrusions over southeastern Spain by sector origins and cluster analysis , 2012 .

[54]  Kwon-Ho Lee,et al.  A study of impact of Asian dusts and their transport pathways to Hong Kong using multiple AERONET data, trajectory, and in-situ measurements , 2010, Asia-Pacific Remote Sensing.

[55]  Wei-Zhen Lu,et al.  Performance assessment of air quality monitoring networks using principal component analysis and clu , 2011 .

[56]  José C.M. Pires,et al.  Management of air quality monitoring using principal component and cluster analysis—Part I: SO2 and PM10 , 2008 .

[57]  P. Buseck,et al.  Cluster analysis applied to atmospheric aerosol samples from the Norwegian Arctic , 1987 .

[58]  T. Soni Madhulatha,et al.  An Overview on Clustering Methods , 2012, ArXiv.

[59]  Robert E. Davis,et al.  A synoptic climatological analysis of winter visibility trends in the mideastern United States , 1991 .

[60]  Armistead G Russell,et al.  Characterization of Spatially Homogeneous Regions Based on Temporal Patterns of Fine Particulate Matter in the Continental United States , 2008, Journal of the Air & Waste Management Association.

[61]  G. McGregor,et al.  Synoptic typing and its application to the investigation of weather air pollution relationships, Birmingham, United Kingdom , 1995 .

[62]  Olga Lyapina,et al.  Cluster analysis of European surface ozone observations for evaluation of MACC reanalysis data , 2016 .

[63]  Michael R. Olson,et al.  Source apportionment of PM2.5 organic carbon in the San Joaquin Valley using monthly and daily observations and meteorological clustering. , 2018, Environmental pollution.

[64]  Anne M. Thompson,et al.  Aircraft vertical profiles of trace gas and aerosol pollution over the mid‐Atlantic United States: Statistics and meteorological cluster analysis , 2006 .

[65]  L. Kalkstein,et al.  A Synoptic Climatological Approach For Geographical Analysis: Assessment of Sulfur Dioxide Concentrations , 1986 .

[66]  J. Adame,et al.  Application of cluster analysis to surface ozone, NO₂ and SO₂ daily patterns in an industrial area in Central-Southern Spain measured with a DOAS system. , 2012, The Science of the total environment.

[67]  T. Vesala,et al.  Fingerprints of the urban particle number size distribution in Helsinki, Finland: Local versus regional characteristics , 2014 .

[68]  G. Djolov,et al.  Source profiling, source apportionment and cluster transport analysis to identify the sources of PM and the origin of air masses to an industrialised rural area in Limpopo , 2018 .

[69]  Hongliang Zhang,et al.  Source apportionment of PM2.5 for 25 Chinese provincial capitals and municipalities using a source-oriented Community Multiscale Air Quality model. , 2018, The Science of the total environment.

[70]  V. Joshi,et al.  Cluster analysis of Delhi's ambient air quality data. , 2003, Journal of environmental monitoring : JEM.

[71]  V. Ulevicius,et al.  The Use of Trajectory Cluster Analysis to Evaluate the Long-Range Transport of Black Carbon Aerosol in the South-Eastern Baltic Region , 2014 .

[72]  P. Xie,et al.  Cluster Analysis for Daily Patterns of SO2 and NO2 Measured by the DOAS System in Xiamen , 2014 .

[73]  Anil K. Jain,et al.  Clustering techniques: The user's dilemma , 1976, Pattern Recognit..

[74]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[75]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[76]  J. Soares,et al.  The use of hierarchical clustering for the design of optimized monitoring networks , 2018 .

[77]  A. Donateo,et al.  Characterisation and source apportionment of PM10 in an urban background site in Lecce , 2010 .

[78]  Stephan Weber,et al.  A uniform classification of aerosol signature size distributions based on regression-guided and observational cluster analysis , 2014 .

[79]  Nigel Bruce,et al.  Indoor air pollution from biomass fuel smoke is a major health concern in the developing world , 2008, Transactions of the Royal Society of Tropical Medicine and Hygiene.

[80]  D. Walker,et al.  An upper-air synoptic climatology of the western United States , 1992 .

[81]  P. Kassomenos,et al.  Cluster analysis of five years of back trajectories arriving in Athens, Greece , 2010 .

[82]  K. O. Ogunjobi,et al.  Characteristics of PM2.5 species and long-range transport of air masses at Taean background station, South Korea , 2003 .

[83]  F. Cheng,et al.  Synoptic Weather Patterns and Associated Air Pollution in Taiwan , 2019, Aerosol and Air Quality Research.

[84]  Perry J. Samson,et al.  Use of Cluster Analysis to Define Periods of Similar Meteorology and Precipitation Chemistry in Eastern North America. Part II: Precipitation Patterns and Pollutant Deposition , 1990 .

[85]  Elena Austin,et al.  A framework for identifying distinct multipollutant profiles in air pollution data. , 2012, Environment International.

[86]  Rafael Pino-Mejías,et al.  Modelling background air pollution exposure in urban environments: Implications for epidemiological research , 2018, Environ. Model. Softw..

[87]  T. Davies,et al.  Extending cluster analysis—synoptic meteorology links to characterise chemical climates at six northwest European monitoring stations , 1995 .

[88]  Mohd Talib Latif,et al.  Long term assessment of air quality from a background station on the Malaysian Peninsula. , 2014, The Science of the total environment.

[89]  Rafael Pino-Mejías,et al.  Finite mixture models to characterize and refine air quality monitoring networks. , 2014, The Science of the total environment.

[90]  B. Broderick,et al.  The effect of long-range air mass transport pathways on PM10 and NO2 concentrations at urban and rural background sites in Ireland: Quantification using clustering techniques , 2015, Journal of environmental science and health. Part A, Toxic/hazardous substances & environmental engineering.

[91]  K. Mo,et al.  Cluster analysis of multiple planetary flow regimes , 1988 .

[92]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[93]  Chao-Hung Lin,et al.  Integration of fuzzy cluster analysis and kernel density estimation for tracking typhoon trajectories in the Taiwan region , 2012, Expert Syst. Appl..

[94]  Pedro Oyola,et al.  Examination of pollution trends in Santiago de Chile with cluster analysis of PM10 and Ozone data , 2006 .

[95]  James N. Galloway,et al.  Quantifying the relationship between atmospheric transport and the chemical composition of precipitation on Bermuda , 1988 .

[96]  Jinbin Cao,et al.  The transport pathways and sources of PM10 pollution in Beijing during spring 2001, 2002 and 2003 , 2004 .

[97]  Perry J. Samson,et al.  Use of Cluster Analysis to Define Periods of Similar Meteorology and Precipitation Chemistry in Eastern North America. Part I: Transport Patterns , 1990 .

[98]  Yinchang Feng,et al.  Source apportionment of ambient PM 10 and PM 2.5 in Haikou, China , 2017 .

[99]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[100]  Andries Petrus Engelbrecht,et al.  An overview of clustering methods , 2007, Intell. Data Anal..

[101]  Y. Yanagisawa,et al.  A Cluster Analysis of Constant Ambient Air Monitoring Data from the Kanto Region of Japan , 2014, International journal of environmental research and public health.

[102]  I. Matyasovszky,et al.  Monitoring the long-range transport effects on urban PM10 levels using 3D clusters of backward trajectories , 2011 .

[103]  V. Ulevicius,et al.  Long-term black carbon variation in the South-Eastern Baltic Region in 2008–2015 , 2019, Atmospheric Pollution Research.

[104]  M. Kampa,et al.  Human health effects of air pollution. , 2008, Environmental pollution.

[105]  Dongsheng Chen,et al.  Application of Trajectory Clustering and Source Apportionment Methods for Investigating Trans-Boundary Atmospheric PM10 Pollution , 2013 .

[106]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[107]  J. Cape,et al.  The use of trajectory cluster analysis to interpret trace gas measurements at Mace Head, Ireland , 2000 .

[108]  Mengchu Zhou,et al.  Chemical composition of PM2.5 and meteorological impact among three years in urban Shanghai, China , 2016 .

[109]  R. Harley,et al.  Ozone pollution regimes modeled for a summer season in California’s San Joaquin Valley: A cluster analysis , 2011 .

[110]  Saeed Reza Aghabozorgi Sahaf Yazdi Spatial and Temporal Clustering of Air Pollution in Malaysia: A Review , 2014 .

[111]  Amol P. Bhagat,et al.  Penalty Parameter Selection for Hierarchical Data Stream Clustering , 2016 .

[112]  P. R. Adhikary,et al.  Cluster analysis applied to atmospheric PM10 concentration data for determination of sources and spatial patterns in ambient air-quality of Kathmandu Valley , 2007 .

[113]  Julio Lumbreras,et al.  Comparison of statistical clustering techniques for the classification of modelled atmospheric trajectories , 2010 .

[114]  Kirk R. Smith,et al.  Household Air Pollution from Coal and Biomass Fuels in China: Measurements, Health Impacts, and Interventions , 2007, Environmental health perspectives.

[115]  T. Zieliński,et al.  Cluster analysis of the impact of air back-trajectories on aerosol optical properties at Hornsund, Spitsbergen , 2009 .

[116]  Wei Zhang,et al.  Characterization of lead-containing atmospheric particles in a typical basin city of China: Seasonal variations, potential source areas, and responses to fireworks. , 2019, The Science of the total environment.

[117]  R. E. Davis A synoptic climatological analysis of air quality in the Grand Canyon National Park , 1993 .

[118]  X. Cheng,et al.  Cluster Analysis of the Northern Hemisphere Wintertime 500-hPa Height Field: Spatial Patterns , 1993 .

[119]  L. Kalkstein,et al.  An Evaluation of Three Clustering Procedures for Use in Synoptic Climatological Classification , 1987 .

[120]  Chongcheng Chen,et al.  Potential sources and transport pathways of PM2.5 in Shanghai, China , 2015, 2015 2nd IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM).

[121]  Victoria E. Cachorro,et al.  Airmass Classification and Analysis of Aerosol Types at El Arenosillo (Spain) , 2009 .

[122]  S. Munir,et al.  An Analysis into the Temporal Variations of Ground Level Ozone in the Arid Climate of Makkah Applying k-means Algorithms , 2015 .

[123]  Francesc Rocadenbosch,et al.  Cluster Analysis of 4-Day Back Trajectories Arriving in the Barcelona Area, Spain, from 1997 to 2002 , 2004 .

[124]  Ping Huang,et al.  Spatial and Temporal Distribution of PM2.5 Pollution in Xi’an City, China , 2015, International journal of environmental research and public health.

[125]  P. S. Porter,et al.  A trajectory-clustering-correlation methodology for examining the long-range transport of air pollutants , 1998 .

[126]  Jing Chen,et al.  A study of air pollution of city clusters , 2011 .

[127]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[128]  T. Vesala,et al.  Properties of aerosol signature size distributions in the urban environment as derived by cluster analysis , 2012 .

[129]  M. McCormick,et al.  Development of global aerosol models using cluster analysis of Aerosol Robotic Network (AERONET) measurements , 2005 .

[130]  Zhe Wang,et al.  Evaluating PM₂.₅ ionic components and source apportionment in Jinan, China from 2004 to 2008 using trajectory statistical methods. , 2011, Journal of environmental monitoring : JEM.

[131]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .