Model population analysis for variable selection

To build a credible model for given chemical or biological or clinical data, it may be helpful to first get somewhat better insight into the data itself before modeling and then to present the statistically stable results derived from a large number of sub‐models established only on one dataset with the aid of Monte Carlo Sampling (MCS). In the present work, a concept model population analysis (MPA) is developed. Briefly, MPA could be considered as a general framework for developing new methods by statistically analyzing some interesting parameters (regression coefficients, prediction errors, etc.) of a number of sub‐models. New methods are expected to be developed by making full use of the interesting parameter in a novel manner. In this work, the elements of MPA are first considered and described. Then, the applications for variable selection and model assessment are emphasized with the help of MPA. Copyright © 2010 John Wiley & Sons, Ltd.

[1]  Dong-Sheng Cao,et al.  A new strategy of outlier detection for QSAR/QSPR , 2009, J. Comput. Chem..

[2]  Yi-Zeng Liang,et al.  Plasma fatty acid metabolic profiling and biomarkers of type 2 diabetes mellitus based on GC/MS and PLS‐LDA , 2006, FEBS letters.

[3]  M. Cronin,et al.  Pitfalls in QSAR , 2003 .

[4]  Barry K Lavine,et al.  Machine learning based pattern recognition applied to microarray data. , 2004, Combinatorial chemistry & high throughput screening.

[5]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[6]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[7]  Reiji Teramoto,et al.  Prediction of siRNA functionality using generalized string kernel and support vector machine , 2005, FEBS letters.

[8]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[9]  Michael G. Barnes,et al.  Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms , 2005, Nucleic acids research.

[10]  Alexey I Nesvizhskii,et al.  Analysis and validation of proteomic data generated by tandem mass spectrometry , 2007, Nature Methods.

[11]  O. Kvalheim,et al.  Biomarker discovery in mass spectral profiles by means of selectivity ratio plot , 2009 .

[12]  Yair Lotan,et al.  Prostate cancer biomarker discovery using high performance mass spectral serum profiling , 2009, Comput. Methods Programs Biomed..

[13]  R. E. Abdel-Aal,et al.  GMDH-based feature ranking and selection for improved classification of medical data , 2005, J. Biomed. Informatics.

[14]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[17]  W. Cai,et al.  A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra , 2008 .

[18]  Douglas M. Hawkins,et al.  Assessing Model Fit by Cross-Validation , 2003, J. Chem. Inf. Comput. Sci..

[19]  T. Niwa,et al.  Biomarker discovery for kidney diseases by mass spectrometry. , 2008, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[20]  J. Dow,et al.  Metabolomic profiling of Drosophila using liquid chromatography Fourier transform mass spectrometry , 2008, FEBS letters.

[21]  Norman D. Black,et al.  Feature selection and classification model construction on type 2 diabetic patients' data , 2007, Artif. Intell. Medicine.

[22]  Bart J. A. Mertens,et al.  Biomarker discovery in MALDI-TOF serum protein profiles using discrete wavelet transformation , 2009, Bioinform..

[23]  R. Aebersold,et al.  Mass Spectrometry and Protein Analysis , 2006, Science.

[24]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[25]  Hongdong Li,et al.  Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. , 2009, Analytica chimica acta.

[26]  Yi-Zeng Liang,et al.  Monte Carlo cross validation , 2001 .

[27]  Qingzhong Liu,et al.  Comparison of feature selection and classification for MALDI-MS data , 2009, BMC Genomics.

[28]  M. Tyers,et al.  From genomics to proteomics , 2003, Nature.

[29]  Xueguang Shao,et al.  A wavelength selection method based on randomization test for near-infrared spectral analysis , 2009 .

[30]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[31]  I. Wilson,et al.  An NMR‐based metabonomic approach to investigate the biochemical consequences of genetic strain differences: application to the C57BL10J and Alpk:ApfCD mouse , 2000, FEBS letters.