Bayesian nonparametric classification for spectroscopy data

High-dimensional spectroscopy data are increasingly common in many fields of science. Building classification models in this context is challenging, due not only to high dimensionality but also to high autocorrelations. A two-stage classification strategy is proposed. First, in a data pre-processing step, the dimensionality of the data is reduced using one of two distinct methods. The output of either of these methods is then used to feed a classification procedure that uses a multivariate density estimate from a Bayesian nonparametric mixture model for discrimination purposes. The model employed is based on a random probability measure with decreasing weights. This nonparametric prior is chosen so as to ease the identifiability and label switching problems inherent to these models. This simple and flexible classification strategy is applied to the well-known ‘meat’ data set. The results are similar or better than previously reported in the literature for the same data.

[1]  Tom Fearn,et al.  Discrimination with Many Variables , 1999 .

[2]  Dennis D. Cox,et al.  Robust smoothing: Smoothing parameter selection and applications to fluorescence spectroscopy , 2010, Comput. Stat. Data Anal..

[3]  Ramsés H. Mena,et al.  Geometric stick-breaking processes for continuous-time Bayesian nonparametric modeling , 2011 .

[4]  D. Dunson,et al.  Efficient Gaussian process regression for large datasets. , 2011, Biometrika.

[5]  Peter Müller,et al.  Semiparametric Bayesian classification with longitudinal markers , 2007, Journal of the Royal Statistical Society. Series C, Applied statistics.

[6]  Gary R. Takeoka,et al.  Authentication of Food and Wine , 2006 .

[7]  Fernando A. Quintana,et al.  Multivariate Bayesian discrimination for varietal authentication of Chilean red wine , 2011 .

[8]  Ciprian M. Crainiceanu,et al.  2 Bayesian Analysis for Penalized Spline Regression Using WinBUGS particular cases of Generalized Linear Mixed Models ( GLMMs , 2005 .

[9]  B. Blight,et al.  A Bayesian approach to model inadequacy for polynomial regression , 1975 .

[10]  Tom Fearn,et al.  Chemometric Processing of Visible and near Infrared Reflectance Spectra for Species Identification in Selected Raw Homogenised Meats , 1999 .

[11]  Stephen G. Walker,et al.  A New Bayesian Nonparametric Mixture Model , 2010, Commun. Stat. Simul. Comput..

[12]  A. Lijoi,et al.  Models Beyond the Dirichlet Process , 2009 .

[13]  Fernando A. Quintana,et al.  Multivariate Bayesian semiparametric models for authentication of food and beverages , 2011, 1202.5914.

[14]  Thomas Brendan Murphy,et al.  Variable Selection and Updating In Model-Based Discriminant Analysis for High Dimensional Data with Food Authenticity Applications. , 2010, The annals of applied statistics.

[15]  Age K. Smilde,et al.  A Classification Model for the Leiden Proteomics Competition , 2008, Statistical applications in genetics and molecular biology.

[16]  A. Raftery,et al.  Variable Selection for Model-Based Clustering , 2006 .

[17]  Stephen G. Walker,et al.  On the Bayesian Mixture Model and Identifiability , 2015 .

[18]  Fernando A. Quintana,et al.  A model-based approach to Bayesian classification with applications to predicting pregnancy outcomes from longitudinal β-hCG profiles , 2007 .

[19]  B. Lindsay,et al.  Bayesian Mixture Labeling by Highest Posterior Density , 2009 .

[20]  John M. Olin On MCMC sampling in hierarchical longitudinal models , 1999 .

[21]  Fernando A Quintana,et al.  A model-based approach to Bayesian classification with applications to predicting pregnancy outcomes from longitudinal beta-hCG profiles. , 2007, Biostatistics.

[22]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[23]  Francesco C Stingo,et al.  BAYESIAN WAVELET-BASED CURVE CLASSIFICATION VIA DISCRIMINANT ANALYSIS WITH MARKOV RANDOM TREE PRIORS. , 2012, Statistica Sinica.

[24]  Sounak Chakraborty,et al.  Bayesian multiple response kernel regression model for high dimensional data and its practical applications in near infrared spectroscopy , 2012, Comput. Stat. Data Anal..

[25]  Tom Fearn,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .

[26]  Michael,et al.  On a Class of Bayesian Nonparametric Estimates : I . Density Estimates , 2008 .

[27]  Nico Nagelkerke,et al.  Developing a Discrimination Rule between Breast Cancer Patients and Controls Using Proteomics Mass Spectrometric Data: A Three-Step Approach , 2008, Statistical applications in genetics and molecular biology.

[28]  I. Jolliffe Principal Component Analysis , 2002 .

[29]  Eduardo Gutiérrez-Peña,et al.  Aspects of smoothing and model inadequacy in generalized regression , 1998 .

[30]  T. B. Murphy,et al.  A comparison of model-based and regression classification techniques applied to near infrared spectroscopic data in food authentication studies , 2007 .

[31]  Albert Y. Lo,et al.  On a Class of Bayesian Nonparametric Estimates: I. Density Estimates , 1984 .

[32]  N. Dean,et al.  Using unlabelled data to update classification rules with applications in food authenticity studies , 2006 .

[33]  Somnath Datta,et al.  Classification of Breast Cancer versus Normal Samples from Mass Spectrometry Profiles Using Linear Discriminant Analysis of Important Features Selected by Random Forest , 2008, Statistical applications in genetics and molecular biology.

[34]  Panagiotis Besbeas,et al.  A Bayesian decision theory approach to variable selection for discrimination , 2002, Stat. Comput..

[35]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[36]  Ramsés H. Mena Geometric weight priors and their applications , 2013 .