论文信息 - Subspace discriminant index to expedite exploration of multi-class omics data

Subspace discriminant index to expedite exploration of multi-class omics data

Abstract Omics datasets, comprehensively characterizing biological samples at a molecular level, are continuously increasing in both complexity and dimensionality. In this scenario, there is a need for tools to improve data interpretability, expediting the process of extracting relevant biochemical information. Here we introduce the subspace discriminant index (SDI) for multi-component models, which points to the most promising components to explore pre-defined groups of observations, and can also be used to compare several modeling variants in terms of discriminative power. The SDI is especially useful during the initial exploration of a data set, in order to make informed decisions on, e.g., pre-processing or modeling variants for further analysis. The versatility and the efficiency of the proposed index is demonstrated in two real world omics case studies, including a highly complex multi-class problem. The code for the computation of the SDI is freely available in the Matlab MEDA toolbox and linked in the present manuscript. By boosting the interpretation capabilities, the SDI represents a significant addition to the chemometric toolbox.

[1] Emmanuel Hatzakis,et al. Quality assessment and authentication of virgin olive oil by NMR spectroscopy: a critical review. , 2013, Analytica chimica acta.

[2] R. Tibshirani,et al. Sparse Principal Component Analysis , 2006 .

[3] Mona Singh,et al. Computational solutions for omics data , 2013, Nature Reviews Genetics.

[4] Dimitrios Boskou,et al. Olive Oil Composition , 2006 .

[5] William J. Griffiths,et al. Mass spectrometry: from proteomics to metabolomics and lipidomics. , 2009, Chemical Society reviews.

[6] I. Jolliffe,et al. A Modified Principal Component Technique Based on the LASSO , 2003 .

[7] M. Schatz. Biological data sciences in genome research , 2015, Genome research.

[8] J. Jaumot,et al. Lipidomic data analysis: tutorial, practical guidelines and applications. , 2015, Analytica chimica acta.

[9] Philippe Besse,et al. Statistical Applications in Genetics and Molecular Biology A Sparse PLS for Variable Selection when Integrating Omics Data , 2011 .

[10] H. Cheung,et al. Lipidomic study of olive fruit and oil using TiO2 nanoparticle based matrix solid-phase dispersion and MALDI-TOF/MS , 2013 .

[11] Perttu S. Niemelä,et al. Bioinformatics and computational methods for lipidomics. , 2009, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[12] B. Kowalski,et al. Partial least-squares regression: a tutorial , 1986 .

[13] Edoardo Saccenti,et al. Group-Wise Principal Component Analysis for Exploratory Data Analysis , 2017 .

[14] James C. Pino,et al. Integrated, High-Throughput, Multiomics Platform Enables Data-Driven Construction of Cellular Responses and Reveals Global Drug Mechanisms of Action. , 2017, Journal of proteome research.

[15] S. Wold,et al. PLS-regression: a basic tool of chemometrics , 2001 .

[16] Fiona Crawford,et al. Chronic elevation of phosphocholine containing lipids in mice exposed to Gulf War agents pyridostigmine bromide and permethrin. , 2013, Neurotoxicology and teratology.

[17] José Camacho,et al. Multivariate Exploratory Data Analysis (MEDA) Toolbox for Matlab , 2015 .

[18] N. M. Faber,et al. How to avoid over-fitting in multivariate calibration--the conventional validation approach and an alternative. , 2007, Analytica chimica acta.

[19] Peng Gao,et al. Application of fuzzy c-means clustering in data analysis of metabolomics. , 2009, Analytical chemistry.

[20] Johan Trygg,et al. Chemometrics in metabonomics. , 2007, Journal of proteome research.

[21] Jianren Gu,et al. Plasma phospholipid metabolic profiling and biomarkers of type 2 diabetes mellitus based on high-performance liquid chromatography/electrospray mass spectrometry and multivariate statistical analysis. , 2005, Analytical chemistry.

[22] José Camacho,et al. On the use of the observation‐wise k‐fold operation in PCA cross‐validation , 2015 .

[23] Xianlin Han,et al. Lipidomics: Comprehensive Mass Spectrometry of Lipids , 2016 .