Discriminant analysis and feature selection in mass spectrometry imaging using constrained repeated random sampling - Cross validation (CORRS-CV).

The identification of biomarkers through Mass spectrometry imaging (MSI) is gaining popularity in the clinical field. However, considering the complexity of spectral and spatial variables faced, data mining of the hyperspectral images can be troublesome. The discovery of markers generally depends on the creation of classification models which should be validated to ensure the statistical significance of the discriminants m/z detected. Internal validation using resampling methods such as cross validation (CV) are widely used for model selection, the estimation of its generalization performance and biomarker discovery when sample sizes are limited and an independent test set is not available. Here, we introduce for first time the use of Constrained Repeated Random Subsampling CV (CORRS-CV) on multi-images for the validation of classification models on MSI. Although several aspects must be taken into account (e.g. image size, CORRS-CV∂value, the similarity across spatially close pixels, the total computation time), CORRS-CV provides more accurate estimates of the model performance than k-fold CV using of biological replicates to define the data split when the number of biological replicates is scarce and holding images back for testing is a waste of valuable information. Besides, the combined use of CORRS-CV and rank products increases the robustness of the selection of discriminant features as candidate biomarkers which is an important issue due to the increased biological, environmental and technical variabilities when analysing multiple images, especially from human tissues collected in clinical studies.

[1]  B. Heijs,et al.  Mass spectrometry imaging: How will it affect clinical research in the future? , 2018, Expert review of proteomics.

[2]  Olga Vitek,et al.  Cardinal: an R package for statistical analysis of mass spectrometry-based imaging experiments , 2015, Bioinform..

[3]  Richard G. Brereton,et al.  Chemometrics for Pattern Recognition , 2009 .

[4]  Vincenzo Lagani,et al.  Performance-Estimation Properties of Cross-Validation-Based Protocols with Simultaneous Hyper-Parameter Optimization , 2015, Int. J. Artif. Intell. Tools.

[5]  Rainer Breitling,et al.  Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments , 2004, FEBS letters.

[6]  B. Rocha,et al.  Mass spectrometry imaging: a novel technology in rheumatology , 2017, Nature Reviews Rheumatology.

[7]  Vincenzo Lagani,et al.  Performance-Estimation Properties of Cross-Validation-Based Protocols with Simultaneous Hyper-Parameter Optimization , 2014, Int. J. Artif. Intell. Tools.

[8]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[9]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[10]  S. Tsakovski,et al.  Comparison of the variable importance in projection (VIP) and of the selectivity ratio (SR) methods for variable selection and interpretation , 2015 .

[11]  Run-tao Tian,et al.  MassImager: A software for interactive and in-depth analysis of mass spectrometry imaging data. , 2018, Analytica chimica acta.

[12]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[13]  Liam A. McDonnell,et al.  Imaging mass spectrometry statistical analysis. , 2012, Journal of proteomics.

[14]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[15]  P. Ubezio,et al.  Past-in-the-Future. Peak detection improves targeted mass spectrometry imaging. , 2018, Analytica chimica acta.

[16]  M. Clench,et al.  Mass spectrometry imaging and its application in pharmaceutical research and development: A concise review , 2019, International Journal of Mass Spectrometry.

[17]  Age K. Smilde,et al.  Assessing the performance of statistical validation tools for megavariate metabolomics data , 2006, Metabolomics.

[18]  Anthony B. Costa,et al.  Multivariate statistical differentiation of renal cell carcinomas based on lipidomic analysis by ambient ionization imaging mass spectrometry , 2010, Analytical and bioanalytical chemistry.

[19]  Lingjun Li,et al.  Mass Spectrometry Imaging: A Review of Emerging Advancements and Future Insights. , 2018, Analytical chemistry.

[20]  Age K. Smilde,et al.  UvA-DARE ( Digital Academic Repository ) Assessment of PLSDA cross validation , 2008 .

[21]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[22]  B. Wood,et al.  Assessment of discriminant models in infrared imaging using constrained repeated random sampling - Cross validation. , 2018, Analytica chimica acta.

[23]  Theodore Alexandrov,et al.  Spatial segmentation of imaging mass spectrometry data with edge-preserving image denoising and clustering. , 2010, Journal of proteome research.

[24]  Cyril Ruckebusch,et al.  On the implementation of spatial constraints in multivariate curve resolution alternating least squares for hyperspectral image analysis , 2015 .