Optimal Chemical Grouping and Sorbent Material Design by Data Analysis, Modeling and Dimensionality Reduction Techniques.

The ultimate goal of the Texas A&M Superfund program is to develop comprehensive tools and models for addressing exposure to chemical mixtures during environmental emergency-related contamination events. With that goal, we aim to design a framework for optimal grouping of chemical mixtures based on their chemical characteristics and bioactivity properties, and facilitate comparative assessment of their human health impacts through read-across. The optimal clustering of the chemical mixtures guides the selection of sorption material in such a way that the adverse health effects of each group are mitigated. Here, we perform (i) hierarchical clustering of complex substances using chemical and biological data, and (ii) predictive modeling of the sorption activity of broad-acting materials via regression techniques. Dimensionality reduction techniques are also incorporated to further improve the results. We adopt several recent examples of chemical substances of Unknown or Variable composition Complex reaction products and Biological materials (UVCB) as benchmark complex substances, where the grouping of them is optimized by maximizing the Fowlkes-Mallows (FM) index. The effect of clustering method and different visualization techniques are shown to influence the communication of the groupings for read-across.