Model Population Analysis for Statistical Model Comparison

Model comparison plays a central role in statistical learning and chemometrics. Performances of models need to be assessed using a given criterion based on which models can be compared. To our knowledge, there exist a variety of criteria that can be applied for model assessment, such as Akaike’s information criterion (AIC) [1], Bayesian information criterion (BIC) [2], deviance information criterion (DIC),Mallow’s Cp statistic, cross validation [3-6] and so on. There is a large body of literature that is devoted to these criteria. With the aid of a chosen criterion, different models can be compared. For example, a model with a smaller AIC or BIC is preferred if AIC or BIC are chosen for model assessment.

[1]  Hongdong Li,et al.  Identification of free fatty acids profiling of type 2 diabetes mellitus and exploring possible biomarkers by GC–MS coupled with chemometrics , 2010, Metabolomics.

[2]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[3]  M. Stone Continuum regression: Cross-validated sequentially constructed prediction embracing ordinary least s , 1990 .

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[6]  Jouko Lampinen,et al.  Bayesian Model Assessment and Comparison Using Cross-Validation Predictive Densities , 2002, Neural Computation.

[7]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[8]  Age K. Smilde,et al.  UvA-DARE ( Digital Academic Repository ) Assessment of PLSDA cross validation , 2008 .

[9]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[10]  R. Yu,et al.  An ensemble of Monte Carlo uninformative variable elimination for wavelength selection. , 2008, Analytica chimica acta.

[11]  David Gur,et al.  A permutation test sensitive to differences in areas for comparing ROC curves from a paired design , 2005, Statistics in medicine.

[12]  H. Akaike A new look at the statistical model identification , 1974 .

[13]  P. Filzmoser,et al.  Repeated double cross validation , 2009 .

[14]  Qing-Song Xu,et al.  Support vector machines and its applications in chemistry , 2009 .

[15]  Yang Ai-jun,et al.  Bayesian variable selection for disease classification using gene expression data , 2010 .

[16]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[17]  Dong-Sheng Cao,et al.  Recipe for uncovering predictive genes using support vector machines based on model population analysis , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Qing-Song Xu,et al.  Uncover the path from PCR to PLS via elastic component regression , 2010 .

[19]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  Yi-Zeng Liang,et al.  Monte Carlo cross validation , 2001 .

[22]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[23]  Hongdong Li,et al.  Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. , 2009, Analytica chimica acta.

[24]  Yizeng Liang,et al.  Noise incorporated subwindow permutation analysis for informative gene selection using support vector machines. , 2011, The Analyst.

[25]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[26]  Yi-Zeng Liang,et al.  Plasma fatty acid metabolic profiling and biomarkers of type 2 diabetes mellitus based on GC/MS and PLS‐LDA , 2006, FEBS letters.

[27]  Elaine Martin,et al.  Bayesian linear regression and variable selection for spectroscopic calibration. , 2009, Analytica chimica acta.

[28]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[29]  Yi-Zeng Liang,et al.  Classification of vinegar samples based on near infrared spectroscopy combined with wavelength selection , 2011 .

[30]  Dong-Sheng Cao,et al.  Model population analysis for variable selection , 2010 .

[31]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[32]  Robert Sabatier,et al.  Selection of discriminant wavelength intervals in NIR spectrometry with genetic algorithms , 2006 .

[33]  J. Kalivas Cyclic subspace regression with analysis of the hat matrix , 1999 .

[34]  Anders Björkström,et al.  A Generalized View on Continuum Regression , 1999 .

[35]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[36]  Geoffrey I. Webb,et al.  Feature-subspace aggregating: ensembles for stable and unstable learners , 2011, Machine Learning.

[37]  S. Wold,et al.  Wavelength interval selection in multicomponent spectral analysis by moving window partial least-squares regression with applications to mid-infrared and near-infrared spectroscopic data. , 2002, Analytical chemistry.

[38]  Dong-Sheng Cao,et al.  Recipe for revealing informative metabolites based on model population analysis , 2010, Metabolomics.

[39]  Dong-Sheng Cao,et al.  A new strategy of outlier detection for QSAR/QSPR , 2009, J. Comput. Chem..

[40]  W. Cai,et al.  A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra , 2008 .

[41]  K. Varmuza Chemometrics in Practical Applications , 2012 .

[42]  Po Box IDENTIFICATION OF FINITE IMPULSE RESPONSE MODELS WITH CONTINUUM REGRESSION , 1993 .