Application of fuzzy c-means clustering to PRTR chemicals uncovering their release and toxicity characteristics.

Increasing manufacture and usage of chemicals have not been matched by the increase in our understanding of their risks. Pollutant release and transfer register (PRTR) is becoming a popular measure for collecting chemical data and enhancing the public right to know. However, these data are usually in high dimensionality which restricts their wider use. The present study partitions Japanese PRTR chemicals into five fuzzy clusters by fuzzy c-mean clustering (FCM) to explore the implicit information. Each chemical with membership degrees belongs to each cluster. Cluster I features high releases from non-listed industries and the household sector and high environmental toxicity. Cluster II is characterized by high reported releases and transfers from 24 listed industries above the threshold, mutagenicity, and high environmental toxicity. Chemicals in cluster III have characteristics of high releases from non-listed industries and low toxicity. Cluster IV is characterized by high reported releases and transfers from 24 listed industries above the threshold and extremely high environmental toxicity. Cluster V is characterized by low releases yet mutagenicity and high carcinogenicity. Chemicals with the highest membership degree were identified as representatives for each cluster. For the highest membership degree, half of the chemicals have a value higher than 0.74. If we look at both the highest and the second highest membership degrees simultaneously, about 94% of the chemicals have a value higher than 0.5. FCM can serve as an approach to uncover the implicit information of highly complex chemical dataset, which subsequently supports the strategy development for efficient and effective chemical management.

[1]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Mahdi Mahfouf,et al.  Clustering Files of Chemical Structures Using the Fuzzy k-Means Clustering Method. , 2004 .

[3]  Dorit Kerret,et al.  What Do We Learn from Emissions Reporting? Analytical Considerations and Comparison of Pollutant Release and Transfer Registers in the United States, Canada, England, and Australia , 2007, Risk analysis : an official publication of the Society for Risk Analysis.

[4]  Miklos Feher,et al.  Fuzzy Clustering as a Means of Selecting Representative Conformers and Molecular Alignments , 2003, J. Chem. Inf. Comput. Sci..

[5]  J. Adame,et al.  Application of cluster analysis to surface ozone, NO₂ and SO₂ daily patterns in an industrial area in Central-Southern Spain measured with a DOAS system. , 2012, The Science of the total environment.

[6]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[7]  James C. Bezdek,et al.  Validity-guided (re)clustering with applications to image segmentation , 1996, IEEE Trans. Fuzzy Syst..

[8]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[9]  Damià Barceló,et al.  Fuzzy logic based risk assessment of effluents from waste-water treatment plants. , 2012, The Science of the total environment.

[10]  Shunsuke Managi,et al.  The pollution release and transfer register system in the U.S. and Japan: an analysis of productivity , 2011 .

[11]  James Vail,et al.  The exposure data landscape for manufactured chemicals. , 2012, The Science of the total environment.

[12]  Sueli Aparecida Mingoti,et al.  Comparing SOM neural network with Fuzzy c , 2006, Eur. J. Oper. Res..

[13]  Irem Dikmen,et al.  Comparing the performance of traditional cluster analysis, self-organizing maps and fuzzy C-means method for strategic grouping , 2009, Expert Syst. Appl..

[14]  S. Devito,et al.  Using pollutant release and transfer register data in human health research: a scoping review , 2013 .

[15]  Naomie Salim,et al.  Graph‐Based Consensus Clustering for Combining Multiple Clusterings of Chemical Structures , 2013, Molecular informatics.

[16]  Fu Gu,et al.  Performance evaluation for composites based on recycled polypropylene using principal component analysis and cluster analysis , 2016 .

[17]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[18]  Christian Döring,et al.  Data analysis with fuzzy clustering methods , 2006, Comput. Stat. Data Anal..

[19]  Miin-Shen Yang,et al.  Robust-learning fuzzy c-means clustering algorithm with unknown number of clusters , 2017, Pattern Recognit..

[20]  Balazs Feil,et al.  Fuzzy Clustering and Data Analysis Toolbox For Use with Matlab , 2005 .

[21]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[22]  Ke-Lin Du,et al.  Clustering: A neural network approach , 2010, Neural Networks.

[23]  Xiao-Jun Zeng,et al.  Fuzzy C-means++: Fuzzy C-means with effective seeding initialization , 2015, Expert Syst. Appl..

[24]  Ole John Nielsen,et al.  Ranking of chemical substances based on the Japanese Pollutant Release and Transfer Register using partial order theory and random linear extensions. , 2004, Chemosphere.

[25]  Horia F. Pop,et al.  Fuzzy clustering analysis of the first 10 MEIC chemicals. , 2000, Chemosphere.