Supervised machine learning for source allocation of per- and polyfluoroalkyl substances (PFAS) in environmental samples.

Environmental contamination by per- and polyfluoroalkyl substances (PFAS) is widespread, because of both their decades of use, and their persistence in the environment. These factors can make identification of the source of contamination in samples a challenge, because in many cases contamination may originate from decades ago, or from a number of candidate sources. Forensic source allocation is important for delineating plumes, and may also be able to provide insights into environmental behaviors of specific PFAS components. This paper describes work conducted to explore the use of supervised machine learning classifiers for allocating the source of PFAS contamination based on patterns identified in component concentrations. A dataset containing PFAS component concentrations in 1197 environmental water samples was assembled based on data from sites from around the world. The dataset was split evenly into training and test datasets, and the 598-sample training dataset was used to train four machine learning classifiers, including three conventional machine learning classifiers (Extra Trees, Support-Vector Machines, K-Neighbors), and one multilayer perceptron feedforward deep neural network. Of the methods tested, the deep neural network and Extra Trees exhibited particularly high performance at classification of samples from a range of sources. The fact that the methods function on completely different principles and yet provide similar predictions supports the hypothesis that patterns exist in PFAS water sample data that can allow forensic source allocation. The results of the work support the idea that supervised machine learning may have substantial promise as a tool for forensic source allocation.

[1]  Thomas Jeffries,et al.  Developing a roadmap to determine per- and polyfluoroalkyl substances-microbial population interactions. , 2019, The Science of the total environment.

[2]  Dennis R Helsel,et al.  Fabricating data: how substituting values for nondetects can ruin results, and what can be done about it. , 2006, Chemosphere.

[3]  D. Sedlak,et al.  Persistence of perfluoroalkyl acid precursors in AFFF-impacted groundwater and soil. , 2013, Environmental science & technology.

[4]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[5]  F. Lestremau,et al.  Occurrence survey and spatial distribution of perfluoroalkyl and polyfluoroalkyl surfactants in groundwater, surface water, and sediments from tropical environments. , 2017, The Science of the total environment.

[6]  M. Barlaz,et al.  National Estimate of Per- and Polyfluoroalkyl Substance (PFAS) Release to U.S. Municipal Landfill Leachate. , 2017, Environmental science & technology.

[7]  L. Alvarez-Cohen,et al.  Aerobic Biotransformation of Fluorotelomer Thioether Amido Sulfonate (Lodyne) in AFFF-Amended Microcosms. , 2015, Environmental science & technology.

[8]  R. Anderson,et al.  Occurrence of select perfluoroalkyl substances at U.S. Air Force aqueous film-forming foam release sites other than fire-training areas: Field-validation of critical fate and transport properties. , 2016, Chemosphere.

[9]  B. Xi,et al.  Spatial distribution and source apportionment of PFASs in surface sediments from five lake regions, China , 2016, Scientific Reports.

[10]  M. Gómez-Ramos,et al.  Discovery of novel per- and polyfluoroalkyl substances (PFASs) at a fire fighting training ground and preliminary investigation of their fate and mobility. , 2017, Chemosphere.

[11]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[12]  Jinxia Liu,et al.  Production of PFOS from aerobic soil biotransformation of two perfluoroalkyl sulfonamide derivatives. , 2015, Chemosphere.

[13]  K. Wiberg,et al.  Per- and Polyfluoroalkyl Substances in Swedish Groundwater and Surface Water: Implications for Environmental Quality Standards and Drinking Water Guidelines. , 2018, Environmental science & technology.

[14]  P. Grandjean,et al.  Can profiles of poly- and Perfluoroalkyl substances (PFASs) in human serum provide information on major exposure sources? , 2018, Environmental Health.

[15]  J. Mueller,et al.  Australia-wide assessment of perfluoroalkyl substances (PFASs) in landfill leachates. , 2017, Journal of hazardous materials.

[16]  C. Vecitis,et al.  Source attribution of poly- and perfluoroalkyl substances (PFASs) in surface waters from Rhode Island and the New York Metropolitan Area. , 2016, Environmental science & technology letters.

[17]  S. Mabury,et al.  Aerobic biodegradation of 2 fluorotelomer sulfonamide–based aqueous film–forming foam components produces perfluoroalkyl carboxylates , 2017, Environmental toxicology and chemistry.

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20]  K. Wiberg,et al.  Screening of PFASs in groundwater and surface water , 2016 .