Forecasting Chronic Diseases Using Data Fusion.

Data fusion, that is, extracting information through the fusion of complementary data sets, is a topic of great interest in metabolomics because analytical platforms such as liquid chromatography-mass spectrometry (LC-MS) and nuclear magnetic resonance (NMR) spectroscopy commonly used for chemical profiling of biofluids provide complementary information. In this study, with a goal of forecasting acute coronary syndrome (ACS), breast cancer, and colon cancer, we jointly analyzed LC-MS, NMR measurements of plasma samples, and the metadata corresponding to the lifestyle of participants. We used supervised data fusion based on multiple kernel learning and exploited the linearity of the models to identify significant metabolites/features for the separation of healthy referents and the cases developing a disease. We demonstrated that (i) fusing LC-MS, NMR, and metadata provided better separation of ACS cases and referents compared with individual data sets, (ii) NMR data performed the best in terms of forecasting breast cancer, while fusion degraded the performance, and (iii) neither the individual data sets nor their fusion performed well for colon cancer. Furthermore, we showed the strengths and limitations of the fusion models by discussing their performance in terms of capturing known biomarkers for smoking and coffee. While fusion may improve performance in terms of separating certain conditions by jointly analyzing metabolomics and metadata sets, it is not necessarily always the best approach as in the case of breast cancer.

[1]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[2]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[3]  Julien Boccard,et al.  A consensus orthogonal partial least squares discriminant analysis (OPLS-DA) strategy for multiblock Omics data fusion. , 2013, Analytica chimica acta.

[4]  Caroline H. Johnson,et al.  Metabolomics: beyond biomarkers and towards mechanisms , 2016, Nature Reviews Molecular Cell Biology.

[5]  M. Munafo,et al.  Associations between smoking and caffeine consumption in two European cohorts , 2016, Addiction.

[6]  H. Nielsen,et al.  Data fusion in metabolomic cancer diagnostics , 2012, Metabolomics.

[7]  F. Hu,et al.  Caffeinated and Decaffeinated Coffee Consumption and Risk of Type 2 Diabetes: A Systematic Review and a Dose-Response Meta-analysis , 2014, Diabetes Care.

[8]  G. Spiteller,et al.  16,17‐Dihydroxy‐9(11)‐Kauren‐18‐säure—ein Bestandteil des Röstkaffees , 1975 .

[9]  F Savorani,et al.  icoshift: A versatile tool for the rapid alignment of 1D NMR spectra. , 2010, Journal of magnetic resonance.

[10]  Daniel Eriksson,et al.  Data integration in plant biology: the O2PLS method for combined modeling of transcript and metabolite data. , 2007, The Plant journal : for cell and molecular biology.

[11]  L. Dragsted,et al.  Metabolic fingerprinting of high-fat plasma samples processed by centrifugation- and filtration-based protein precipitation delineates significant differences in metabolite information coverage. , 2012, Analytica chimica acta.

[12]  L. Bouter,et al.  Coffee consumption and incidence of impaired fasting glucose, impaired glucose tolerance, and type 2 diabetes: the Hoorn Study , 2004, Diabetologia.

[13]  A. Smilde,et al.  Fusion of mass spectrometry-based metabolomics data. , 2005, Analytical chemistry.

[14]  A. Tjønneland,et al.  Validity of individual portion size estimates in a food frequency questionnaire. , 1994, International journal of epidemiology.

[15]  C. Dethlefsen,et al.  Predictive values of acute coronary syndrome discharge diagnoses differed in the Danish National Patient Registry. , 2009, Journal of clinical epidemiology.

[16]  T. Ebbels,et al.  Metabolic profiling, metabolomic and metabonomic procedures for NMR spectroscopy of urine, plasma, serum and tissue extracts , 2007, Nature Protocols.

[17]  A. Tjønneland,et al.  Does the association between smoking status and selected healthy foods depend on gender? A population-based study of 54 417 middle-aged Danes , 2002, European Journal of Clinical Nutrition.

[18]  R. L. Prentice,et al.  A case-cohort design for epidemiologic cohort studies and disease prevention trials , 1986 .

[19]  M. Spraul,et al.  750 MHz 1H and 1H-13C NMR spectroscopy of human blood plasma. , 1995, Analytical chemistry.

[20]  N. Benowitz,et al.  Cotinine as a biomarker of environmental tobacco smoke exposure. , 1996, Epidemiologic reviews.

[21]  Y. Kokubo,et al.  Dietary fiber intake and risk of cardiovascular disease in the Japanese population: the Japan Public Health Center-based study cohort , 2011, European Journal of Clinical Nutrition.

[22]  A. Tjønneland,et al.  Validation of a semiquantitative food frequency questionnaire developed in Denmark. , 1991, International journal of epidemiology.

[23]  T. Moritz,et al.  Physical fitness level is reflected by alterations in the human plasma metabolome. , 2012, Molecular Biosystems.

[24]  A. Astrup,et al.  Standardization of factors that influence human urine metabolomics , 2011, Metabolomics.

[25]  M. M. Joosten,et al.  Plasma homocysteine, dietary B vitamins, betaine, and choline and risk of peripheral artery disease. , 2014, Atherosclerosis.

[26]  D. Faeh,et al.  Heavy Smoking Is More Strongly Associated with General Unhealthy Lifestyle than Obesity and Underweight , 2016, PloS one.

[27]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[28]  Nigel W. Hardy,et al.  Proposed minimum reporting standards for chemical analysis , 2007, Metabolomics.

[29]  Zerihun T. Dame,et al.  The Human Urine Metabolome , 2013, PloS one.

[30]  Tormod Næs,et al.  Chemometrics in foodomics: Handling data structures from multiple analytical platforms , 2014 .

[31]  L. Chambless,et al.  heart disease: the Atherosclerosis Risk in Communities (ARIC) Study , 2007 .

[32]  Lutgarde M. C. Buydens,et al.  Fusion of metabolomics and proteomics data for biomarkers discovery: case study on the experimental autoimmune encephalomyelitis , 2011, BMC Bioinformatics.

[33]  M. Woodward,et al.  Coffee, decaffeinated coffee, and tea consumption in relation to incident type 2 diabetes mellitus: a systematic review with meta-analysis. , 2009, Archives of internal medicine.

[34]  M. Bots,et al.  Prospective study on dietary intakes of folate, betaine, and choline and cardiovascular disease risk in women , 2008, European Journal of Clinical Nutrition.

[35]  G. Xie,et al.  Metabonomics of human colorectal cancer: new approaches for early diagnosis and biomarker discovery. , 2014, Journal of proteome research.

[36]  Lutgarde M. C. Buydens,et al.  Interpretation and Visualization of Non-Linear Data Fusion in Kernel Space: Study on Metabolomic Characterization of Progression of Multiple Sclerosis , 2012, PloS one.

[37]  T. Skov,et al.  The Effect of LC-MS Data Preprocessing Methods on the Selection of Plasma Biomarkers in Fed vs. Fasted Rats , 2012, Metabolites.

[38]  B. McManus,et al.  The Human Serum Metabolome , 2011, PloS one.

[39]  Forecasting individual breast cancer risk using plasma metabolomics and biocontours , 2015, Metabolomics.