Machine learning methods for quantitative analysis of Raman spectroscopy data

The automated identification and quantification of illicit materials using Raman spectroscopy is of significant importance for law enforcement agencies. This paper explores the use of Machine Learning (ML) methods in comparison with standard statistical regression techniques for developing automated identification methods. In this work, the ML task is broken into two sub-tasks, data reduction and prediction. In well-conditioned data, the number of samples should be much larger than the number of attributes per sample, to limit the degrees of freedom in predictive models. In this spectroscopy data, the opposite is normally true. Predictive models based on such data have a high number of degrees of freedom, which increases the risk of models over-fitting to the sample data and having poor predictive power. In the work described here, an approach to data reduction based on Genetic Algorithms is described. For the prediction sub-task, the objective is to estimate the concentration of a component in a mixture, based on its Raman spectrum and the known concentrations of previously seen mixtures. Here, Neural Networks and k-Nearest Neighbours are used for prediction. Preliminary results are presented for the problem of estimating the concentration of cocaine in solid mixtures, and compared with previously published results in which statistical analysis of the same dataset was performed. Finally, this paper demonstrates how more accurate results may be achieved by using an ensemble of prediction techniques.

[1]  Albert Harisovich Kuptsov,et al.  Applications of Fourier Transform Raman Spectroscopy in Forensic Science , 1994 .

[2]  Gerard M. O'Connor,et al.  Identifications and quantitative measurements of narcotics in solid mixtures using near-IR Raman spectroscopy and multivariate analysis , 1999 .

[3]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Colin M. Hodges,et al.  The use of Fourier Transform Raman spectroscopy in the forensic identification of illicit drugs and explosives , 1990 .

[5]  S. Bell,et al.  Rapid analysis of ecstasy and related phenethylamines in seized tablets by Raman spectroscopy. , 2000, The Analyst.

[6]  OpitzDavid,et al.  Popular ensemble methods , 1999 .

[7]  Alan G. Ryder,et al.  Quantitative analysis of cocaine in solid mixtures using Raman spectroscopy and chemometric methods , 2000 .

[8]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[9]  S. Michael Angel,et al.  Raman Spectroscopy for the in Situ Identification of Cocaine and Selected Adulterants , 2000 .

[10]  Hitoshi Tsuchihashi,et al.  Determination of Methamphetamine and its Related Compounds Using Fourier Transform Raman Spectroscopy , 1997 .

[11]  Yukihiro Ozaki,et al.  Quantitative analysis of metabolites in urine using a highly precise, compact near-infrared Raman spectrometer , 1996 .

[12]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[13]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[14]  Jeanette G. Grasselli,et al.  Analytical Raman spectroscopy , 1991 .

[15]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[16]  Brian J. Marquardt,et al.  Some new uses for filtered fiber‐optic Raman probes: in situ drug identification and in situ and remote Raman imaging , 1999 .

[17]  John B. Cooper,et al.  Comparison of Near-IR, Raman, and Mid-IR Spectroscopies for the Determination of BTEX in Petroleum Fuels , 1997 .