Multi-component background learning automates signal detection for spectroscopic data

Automated experimentation has yielded data acquisition rates that supersede human processing capabilities. Artificial Intelligence offers new possibilities for automating data interpretation to generate large, high-quality datasets. Background subtraction is a long-standing challenge, particularly in settings where multiple sources of the background signal coexist, and automatic extraction of signals of interest from measured signals accelerates data interpretation. Herein, we present an unsupervised probabilistic learning approach that analyzes large data collections to identify multiple background sources and establish the probability that any given data point contains a signal of interest. The approach is demonstrated on X-ray diffraction and Raman spectroscopy data and is suitable to any type of data where the signal of interest is a positive addition to the background signals. While the model can incorporate prior knowledge, it does not require knowledge of the signals since the shapes of the background signals, the noise levels, and the signal of interest are simultaneously learned via a probabilistic matrix factorization framework. Automated identification of interpretable signals by unsupervised probabilistic learning avoids the injection of human bias and expedites signal extraction in large datasets, a transformative capability with many applications in the physical sciences and beyond.

[1]  Sven Tougaard,et al.  Algorithm for automatic x-ray photoelectron spectroscopy data processing and x-ray photoelectron spectroscopy imaging , 2005 .

[2]  Jeremy M Wolfe,et al.  What are the shapes of response time distributions in visual search? , 2011, Journal of experimental psychology. Human perception and performance.

[3]  Hans A. Bethe,et al.  Theory of Bremsstrahlung and Pair Production. II. Integral Cross Section for Pair Production , 1954 .

[4]  R K Gupta Applications of Spectroscopy , 1944, Nature.

[5]  Aron Walsh,et al.  The 2019 materials by design roadmap , 2018, Journal of physics D: Applied physics.

[6]  Saddlepoint methods for option pricing , 2012 .

[7]  M. Seah,et al.  The quantitative analysis of surfaces by XPS: A review , 1980 .

[8]  E. J. Sonneveld,et al.  Automatic collection of powder data from photographs , 1975 .

[9]  Iwan Kawrakow,et al.  Calculation of the electron electron bremsstrahlung cross-section in the field of atomic electrons , 2008 .

[10]  P. Carr,et al.  Saddlepoint methods for option pricing , 2009 .

[11]  Alfred Ludwig,et al.  Expediting Combinatorial Data Set Analysis by Combining Human and Algorithmic Analysis. , 2017, ACS combinatorial science.

[12]  Slobodan Mitrovic,et al.  Discovering Ce-rich oxygen evolution catalysts, from high throughput screening to water electrolysis , 2014 .

[13]  John M. Gregoire,et al.  Exponentially-Modified Gaussian Mixture Model: Applications in Spectroscopy , 2019, ArXiv.

[14]  Alán Aspuru-Guzik,et al.  clean energy materials innovation challenge , 2018 .

[15]  John M. Gregoire,et al.  Solar fuel photoanodes prepared by inkjet printing of copper vanadates , 2016 .

[16]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[17]  R. C. Macridis A review , 1963 .

[18]  A. Davydov,et al.  Predicting synthesizability. , 2019, Journal of physics D: Applied physics.

[19]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[21]  John M. Gregoire,et al.  Perspective: Composition–structure–property mapping in high-throughput experiments: Turning data into knowledge , 2016 .

[22]  Ji-Cheng Zhao Combinatorial approaches as effective tools in the study of phase diagrams and composition-structure-property relationships , 2006 .

[23]  A. Golubev,et al.  Exponentially modified Gaussian (EMG) relevance to distributions related to cell proliferation and differentiation. , 2010, Journal of theoretical biology.

[24]  Andreas Ostendorf,et al.  Multivariate Characterization of a Continuous Soot Monitoring System Based on Raman Spectroscopy , 2015 .

[25]  I. N. Toptygin THEORY OF BREMSSTRAHLUNG AND PAIR PRODUCTION IN A MEDIUM , 1964 .

[26]  Dierk Raabe,et al.  Combinatorial metallurgical synthesis and processing of high-entropy alloys , 2018, Journal of Materials Research.

[27]  D. McLean,et al.  Automated Autofluorescence Background Subtraction Algorithm for Biomedical Raman Spectroscopy , 2007, Applied Spectroscopy.

[28]  Hans A. Bethe,et al.  Theory of Bremsstrahlung and Pair Production. I. Differential Cross Section , 1954 .

[29]  H. Kramers,et al.  XCIII. On the theory of X-ray absorption and of the continuous X-ray spectrum , 1923 .

[30]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[31]  S. Tougaard,et al.  Inelastic background intensities in XPS spectra , 1984 .

[32]  Alán Aspuru-Guzik,et al.  Accelerating the discovery of materials for clean energy in the era of smart automation , 2018, Nature Reviews Materials.

[33]  Max von Laue Über die Interferenzerscheinungen an planparallelen Platten , 1904 .