Evaluation of Statistical Methods for Classification of Laser-Induced Breakdown Spectroscopy (LIBS) Data

When NASA’s Curiosity rover lands in August 2012, the rover will use a laserinduced breakdown spectroscopy (LIBS) instrument to collect data in an effort to understand the chemical composition and geological classification of the rocks on Mars. This is part of a larger endeavor to determine information about the planet’s habitability. LIBS is a method used to determine the elemental composition of a given sample. For each rock sample analyzed by the instrument, a LIBS spectrum consisting of over 6,000 different channels is obtained. In order to prepare for the return of LIBS data from the rover, this project aims to evaluate the accuracy of statistical methods, such as discriminant analysis, support vector machines, and clustering algorithms for categorizing the rock samples into groups with similar chemical compositions based on their LIBS spectra alone. Accurate classification is critical for rapid identification of similar unknown samples, novelty detection, and in the selection of a training set of data for use in the estimation of chemical compositions. Similar studies have been performed; however, they generally fail to use statistical best practices and therefore have wildly optimistic results. The data used in this project is from the “century set”, a suite of 100 igneous rock samples. These 100 samples are the only ones currently available for this project which have both LIBS spectra and known chemical compositions. Having the known chemical compositions allowed the century set samples to be divided into groups with geological similarities based on their Total Alkali-Silica (TAS) classes, and provided a way to evaluate the predictive accuracy of the classification algorithms using K-fold cross validation. The results show that the small sample size and uneven distribution of samples in different TAS classes make classification into many groups difficult, contradicting many of the outcomes displayed in the literature. However, some of the methods explored in this thesis do show promise based on their performance in simpler classification tasks, so the results should be reevaluated once more data is obtained. LIBS data is scarce, so this thesis also briefly explores the results from one method of simulating a LIBS spectrum based on the sample’s chemical composition. Simulated data could be used to examine the effects of sample size on the accuracies of the various classification algorithms.

[1]  Jing Peng,et al.  Comparing Linear Discriminant Analysis and Support Vector Machines , 2002, ADVIS.

[2]  S. Gross,et al.  The kappa coefficient of agreement for multiple observers when the number of subjects is small. , 1986, Biometrics.

[3]  J. Anzano,et al.  Classifications of Plastic Polymers based on Spectral Data Analysis with leaser induced Breakdown Spectroscopy , 2010 .

[4]  David G. Stork,et al.  Pattern Classification , 1973 .

[5]  Jagdish P. Singh,et al.  Laser-induced breakdown spectroscopy , 2007 .

[6]  Adrian E. Raftery,et al.  MCLUST Version 3 for R: Normal Mixture Modeling and Model-Based Clustering † , 2007 .

[7]  A. Maćkiewicz,et al.  Principal Components Analysis (PCA) , 1993 .

[8]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[9]  Leon J. Radziemski,et al.  Handbook of Laser-Induced Breakdown Spectroscopy , 2006 .

[10]  K. Strimmer,et al.  Feature selection in omics prediction problems using cat scores and false nondiscovery rate control , 2009, 0903.2003.

[11]  Israel Schechter,et al.  Laser-induced breakdown spectroscopy (LIBS) : fundamentals and applications , 2006 .

[12]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[13]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[14]  R. Tibshirani,et al.  Flexible Discriminant Analysis by Optimal Scoring , 1994 .

[15]  R. C. Wiens,et al.  Nonlinear mapping technique for data visualization and clustering assessment of LIBS data: application to ChemCam data , 2011, Analytical and bioanalytical chemistry.

[16]  Shane C. Burgess,et al.  Preliminary evaluation of laser-induced breakdown spectroscopy for tissue classification , 2009 .

[17]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[18]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[19]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[20]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[21]  Steven D. Brown Introduction to Multivariate Statistical Analysis in Chemometrics , 2010 .

[22]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2008, Information Retrieval.

[23]  Mike Hall,et al.  Machine vision system for automated spectroscopy , 2011, Machine Vision and Applications.

[24]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[25]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[27]  Kurt Hornik,et al.  Support Vector Machines in R , 2006 .