Data Fusion in Metabolomics and Proteomics for Biomarker Discovery.

Proteomics and metabolomics provide key insights into status and dynamics of biological systems. These molecular studies reveal the complex mechanisms involved in disease or aging processes. Invaluable information can be obtained using various analytical techniques such as nuclear magnetic resonance, liquid chromatography, or gas chromatography coupled to mass spectrometry. Each method has inherent advantages and drawbacks, but they are complementary in terms of biological information.The fusion of different measurements is a complex topic. We describe here a framework allowing combining multiple data sets, provided by different analytical platforms. For each platform, the relevant information is extracted in the first step. The obtained latent variables are then fused and further analyzed. The influence of the original variables is then calculated back and interpreted.

[1]  H. Nielsen,et al.  Data fusion in metabolomic cancer diagnostics , 2012, Metabolomics.

[2]  John C Lindon,et al.  Processing and modeling of nuclear magnetic resonance (NMR) metabolic profiles. , 2011, Methods in molecular biology.

[3]  Søren Feodor Nielsen,et al.  Proper and Improper Multiple Imputation , 2003 .

[4]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[5]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[6]  F. V. van Schooten,et al.  Analysis of volatile organic compounds in exhaled breath by gas chromatography-mass spectrometry combined with chemometric analysis. , 2014, Methods in molecular biology.

[7]  T. Hankemeier,et al.  Quantitative metabolomics based on gas chromatography mass spectrometry: status and perspectives , 2010, Metabolomics.

[8]  Lutgarde M. C. Buydens,et al.  Interpretation of variable importance in Partial Least Squares with Significance Multivariate Correlation (sMC) , 2014 .

[9]  A Smolinska,et al.  Current breathomics—a review on data pre-processing techniques and machine learning in metabolomics breath analysis , 2014, Journal of breath research.

[10]  Xiang Zhang,et al.  A method of aligning peak lists generated by gas chromatography high-resolution mass spectrometry. , 2013, The Analyst.

[11]  S. Wijmenga,et al.  NMR and pattern recognition methods in metabolomics: from data acquisition to biomarker discovery: a review. , 2012, Analytica chimica acta.

[12]  Burkhard Morgenstern,et al.  MarVis-Pathway: integrative and exploratory pathway analysis of non-targeted metabolomics data , 2014, Metabolomics.

[13]  Rasmus Bro,et al.  Improving the speed of multi-way algorithms:: Part I. Tucker3 , 1998 .

[14]  D. Vuckovic Current trends and challenges in sample preparation for global metabolomics using liquid chromatography–mass spectrometry , 2012, Analytical and Bioanalytical Chemistry.

[15]  S. Wold,et al.  Orthogonal projections to latent structures (O‐PLS) , 2002 .

[16]  Kimberly Van Auken,et al.  WormBase: a multi-species resource for nematode biology and genomics , 2004, Nucleic Acids Res..

[17]  Mia Hubert,et al.  ROBPCA: A New Approach to Robust Principal Component Analysis , 2005, Technometrics.

[18]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[19]  Tormod Næs,et al.  Chemometrics in foodomics: Handling data structures from multiple analytical platforms , 2014 .

[20]  R. A. van den Berg,et al.  Centering, scaling, and transformations: improving the biological information content of metabolomics data , 2006, BMC Genomics.

[21]  Lutgarde M. C. Buydens,et al.  Interpretation and Visualization of Non-Linear Data Fusion in Kernel Space: Study on Metabolomic Characterization of Progression of Multiple Sclerosis , 2012, PloS one.

[22]  M. D. Luque de Castro,et al.  Metabolomics analysis II. Preparation of biological samples prior to detection , 2010 .

[23]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[24]  Paul H. C. Eilers,et al.  Improved parametric time warping for proteomics , 2010 .

[25]  Ron Wehrens,et al.  Thresholding for biomarker selection in multivariate data using Higher Criticism. , 2012, Molecular bioSystems.

[26]  Yukio Tominaga,et al.  Comparative study of class data analysis with PCA-LDA, SIMCA, PLS, ANNs, and k-NN , 1999 .

[27]  Ron Wehrens,et al.  Stability-based biomarker selection. , 2011, Analytica chimica acta.

[28]  J. Roger,et al.  Fusion of aroma, FT-IR and UV sensor data based on the Bayesian inference. Application to the discrimination of white grape varieties , 2003 .

[29]  Lutgarde M. C. Buydens,et al.  Breaking with trends in pre-processing? , 2013 .

[30]  Anna Bierczynska-Krzysik,et al.  Methods for samples preparation in proteomic research. , 2007, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[31]  Johan Trygg,et al.  Background Estimation, Denoising, and Preprocessing , 2009 .

[32]  Lutgarde M. C. Buydens,et al.  Fusion of metabolomics and proteomics data for biomarkers discovery: case study on the experimental autoimmune encephalomyelitis , 2011, BMC Bioinformatics.

[33]  S. Wold,et al.  PLS: Partial Least Squares Projections to Latent Structures , 1993 .

[34]  Jean-Philippe Vert,et al.  The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures , 2011, PloS one.

[35]  J. Carstensen,et al.  Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping , 1998 .

[36]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[37]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[38]  Joram M. Posma,et al.  MetaboNetworks, an interactive Matlab-based toolbox for creating, customizing and exploring sub-networks from KEGG , 2013, Bioinform..

[39]  Christophe Croux,et al.  TOMCAT: A MATLAB toolbox for multivariate calibration techniques , 2007 .

[40]  P. Eilers A perfect smoother. , 2003, Analytical chemistry.

[41]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[42]  Beata Walczak,et al.  Wavelets in Chemistry , 2001 .

[43]  Peter de B Harrington,et al.  Baseline correction method using an orthogonal basis for gas chromatography/mass spectrometry data. , 2011, Analytical chemistry.

[44]  Rasmus Bro,et al.  A modification of canonical variates analysis to handle highly collinear multivariate data , 2006 .

[45]  Peter J Oefner,et al.  Comprehensive two-dimensional gas chromatography in metabolomics , 2012, Analytical and Bioanalytical Chemistry.

[46]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[47]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[48]  Frans van den Berg,et al.  Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data , 2004 .

[49]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.

[50]  M. D. Luque de Castro,et al.  Metabolomics analysis I. Selection of biological samples and practical aspects preceding sample preparation , 2010 .