Advances in self-organizing maps for their application to compositional data

A self-organizing map (SOM) is a non-linear projection of a D-dimensional data set, where the distance among observations is approximately preserved on to a lower dimensional space. The SOM arranges multivariate data based on their similarity to each other by allowing pattern recognition leading to easier interpretation of higher dimensional data. The SOM algorithm allows for selection of different map topologies, distances and parameters, which determine how the data will be organized on the map. In the particular case of compositional data (such as elemental, mineralogical, or maceral abundance), the sample space is governed by Aitchison geometry and extra steps are required prior to their SOM analysis. Following the principle of working on log-ratio coordinates, the simplicial operations and the Aitchison distance, which are appropriate elements for the SOM, are presented. With this structure developed, a SOM using Aitchison geometry is applied to properly interpret elemental data from combustion products (bottom ash, fly ash, and economizer fly ash) in a Wyoming coal-fired power plant. Results from this effort provide knowledge about the differences between the ash composition in the coal combustion process.

[1]  Lutgarde M. C. Buydens,et al.  Self- and Super-organizing Maps in R: The kohonen Package , 2007 .

[2]  G. Mateu-Figueras,et al.  On the interpretation of differences between groups for compositional data , 2015 .

[3]  S. Shen,et al.  The statistical analysis of compositional data , 1983 .

[4]  R. Olea,et al.  Advancements in hydrochemistry mapping: methods and application to groundwater arsenic and iron concentrations in Varanasi, Uttar Pradesh, India , 2017, Stochastic Environmental Research and Risk Assessment.

[5]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[6]  Bruce Dickson,et al.  An evaluation of methods for imputation of missing trace element data in groundwaters , 2007, Geochemistry: Exploration, Environment, Analysis.

[7]  J. A. Martín-Fernández,et al.  Compositional Data Analysis of Coal Combustion Products with an Application to a Wyoming Power Plant , 2018, Mathematical Geosciences.

[8]  Avner Vengosh,et al.  Survey of the potential environmental and health impacts in the immediate aftermath of the coal ash spill in Kingston, Tennessee. , 2009, Environmental science & technology.

[9]  Brian Everitt,et al.  Cluster analysis , 1974 .

[10]  G. Mateu-Figueras,et al.  The Principle of Working on Coordinates , 2011 .

[11]  B. Everitt,et al.  Cluster Analysis: Everitt/Cluster Analysis , 2011 .

[12]  John Aitchison,et al.  The single principle of compositional data analysis, continuing fallacies, confusionsand misunderstandings and some suggested remedies , 2008 .

[13]  Charlotte Scheutz,et al.  Statistical analysis of solid waste composition data: Arithmetic mean, standard deviation and correlation coefficients. , 2017, Waste management.

[14]  Leslie F. Ruppert,et al.  Partitioning of selected trace elements in coal combustion products from two coal-burning power plants in the United States , 2013 .

[15]  C. Barceló-Vidal,et al.  The mathematics of compositional analysis , 2016 .

[16]  Alexander N. Gorban,et al.  SOM: Stochastic initialization versus principal components , 2016, Inf. Sci..

[17]  Josep-Antoni Martín-Fernández,et al.  Dealing with Distances and Transformations for Fuzzy C-Means Clustering of Compositional Data , 2012, J. Classif..

[18]  L. Buydens,et al.  Supervised Kohonen networks for classification problems , 2006 .

[19]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[20]  J. Egozcue,et al.  Air Quality Index Revisited from a Compositional Point of View , 2016, Mathematical Geosciences.

[21]  V. Pawlowsky-Glahn,et al.  Advances in Principal Balances for Compositional Data , 2018, Mathematical Geosciences.

[22]  V. Pawlowsky-Glahn,et al.  Modeling and Analysis of Compositional Data , 2015 .

[23]  James C. Hower,et al.  Distribution of rare earth elements in coal combustion fly ash, determined by SHRIMP-RG ion microprobe , 2017 .

[24]  Steve Groves,et al.  Geochemical database of feed coal and coal combustion products (CCPs) from five power plants in the United States , 2011 .

[25]  Javier Palarea-Albaladejo,et al.  zCompositions — R package for multivariate imputation of left-censored data under a compositional approach , 2015 .

[26]  R. Sokal,et al.  THE COMPARISON OF DENDROGRAMS BY OBJECTIVE METHODS , 1962 .

[27]  Mahdi Vasighi,et al.  Classification Ability of Self Organizing Maps in Comparison with Other Classification Methods , 2022 .