Enhancing metabolomics research through data mining.

UNLABELLED Metabolomics research, like other disciplines utilizing high-throughput technologies, generates a large amount of data for every sample. Although handling this data is a challenge and one of the biggest bottlenecks of the metabolomics workflow, it is also the clue to accomplish valuable results. This work has been designed to supply methodological data mining guidelines, describing systematically the steps to be followed in metabolomics data exploration. Instrumental raw data refinement in the pre-processing step and assessment of the statistical assumptions in pre-treatment directly affect the results of subsequent univariate and multivariate analyses. A study of aging in a healthy population was selected to represent this data mining process. Multivariate analysis of variance and linear regression methods were used to analyze the metabolic changes underlying aging. Selection of both multivariate methods aims to illustrate the treatment of age from two rather different perspectives, as a categorical variable and a continuous variable. BIOLOGICAL SIGNIFICANCE Metabolomics is a discipline involving the analysis of a large amount of data to gather relevant information. Researchers in this field have to overcome the challenges of complex data processing and statistical analysis issues. A wide range of tasks has to be executed, from the minimization of batch-to-batch/systematic variations in pre-processing, to the application of common data analysis techniques relying on statistical assumptions. In this work, a real-data metabolic profiling research on aging was used to illustrate the proposed workflow and suggest a set of guidelines for analyzing metabolomics data. This article is part of a Special Issue entitled: HUPO 2014.

[1]  M. Čuperlović-Culf,et al.  Cell culture metabolomics: applications and future directions. , 2010, Drug discovery today.

[2]  Mark R. Viant,et al.  Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline , 2011, Metabolomics.

[3]  Masaru Tomita,et al.  Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis , 2012, Current bioinformatics.

[4]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[5]  Shelly C. Lu,et al.  Liquid chromatography-mass spectrometry-based parallel metabolic profiling of human and mouse model serum reveals putative biomarkers associated with the progression of nonalcoholic fatty liver disease. , 2010, Journal of proteome research.

[6]  W. Dröge,et al.  Plasma cystine concentration and redox state in aging and physical exercise , 2002, Mechanisms of Ageing and Development.

[7]  R. Goodacre,et al.  The role of metabolites and metabolomics in clinically applicable biomarkers of disease , 2010, Archives of Toxicology.

[8]  J. Graham,et al.  Missing data analysis: making it work in the real world. , 2009, Annual review of psychology.

[9]  G A Nagana Gowda,et al.  Overview of mass spectrometry-based metabolomics: opportunities and challenges. , 2014, Methods in molecular biology.

[10]  Matej Oresic,et al.  Processing methods for differential analysis of LC/MS profile data , 2005, BMC Bioinformatics.

[11]  P. Royston Approximating the Shapiro-Wilk W-test for non-normality , 1992 .

[12]  H. Perry,et al.  Potentially predictive and manipulable blood serum correlates of aging in the healthy human male: progressive decreases in bioavailable testosterone, dehydroepiandrosterone sulfate, and the ratio of insulin-like growth factor 1 to growth hormone. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Nigel W. Hardy,et al.  Proposed minimum reporting standards for chemical analysis , 2007, Metabolomics.

[14]  Peter Filzmoser,et al.  Introduction to Multivariate Statistical Analysis in Chemometrics , 2009 .

[15]  M. Mattson,et al.  Sphingomyelin and ceramide as regulators of development and lifespan , 2001, Mechanisms of Ageing and Development.

[16]  R. Deberardinis,et al.  Cellular Metabolism and Disease: What Do Metabolic Outliers Teach Us? , 2012, Cell.

[17]  A. Henderson Testing experimental data for univariate normality. , 2006, Clinica chimica acta; international journal of clinical chemistry.

[18]  V. Mootha,et al.  Metabolite profiles and the risk of developing diabetes , 2011, Nature Medicine.

[19]  M. H. Angelis,et al.  Identification of biomarkers for apoptosis in cancer cell lines using metabolomics: tools for individualized medicine , 2013, Journal of internal medicine.

[20]  Frans M van der Kloet,et al.  Analytical error reduction using single point calibration for accurate and precise metabolomic phenotyping. , 2009, Journal of proteome research.

[21]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[22]  Craig K. Enders,et al.  An introduction to modern missing data analyses. , 2010, Journal of school psychology.

[23]  J. Royston Some Techniques for Assessing Multivarate Normality Based on the Shapiro‐Wilk W , 1983 .

[24]  Ian D Wilson,et al.  Analytical strategies in metabonomics. , 2007, Journal of proteome research.

[25]  J. Lindon,et al.  Systems biology: Metabonomics , 2008, Nature.

[26]  D. Kell,et al.  Metabolomics by numbers: acquiring and understanding global metabolite data. , 2004, Trends in biotechnology.

[27]  Joshua D. Knowles,et al.  Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry , 2011, Nature Protocols.

[28]  Christoph Steinbeck,et al.  The role of reporting standards for metabolite annotation and identification in metabolomic studies , 2013, GigaScience.

[29]  Robert S Plumb,et al.  Global metabolic profiling of animal and human tissues via UPLC-MS , 2012, Nature Protocols.

[30]  Oliver Fiehn,et al.  Mass-spectrometry-based metabolomics: limitations and recommendations for future progress with particular focus on nutrition research , 2009, Metabolomics.

[31]  Douglas B. Kell,et al.  Proposed minimum reporting standards for data analysis in metabolomics , 2007, Metabolomics.

[32]  M. Ziegler,et al.  The NAD metabolome — a key determinant of cancer cell biology , 2012, Nature Reviews Cancer.

[33]  E. Want,et al.  HILIC-UPLC-MS for exploratory urinary metabolic profiling in toxicological studies. , 2011, Analytical chemistry.

[34]  Christian Gieger,et al.  Metabolic Footprint of Diabetes: A Multiplatform Metabolomics Study in an Epidemiological Setting , 2010, PloS one.

[35]  Jeremy N. V. Miles,et al.  Tolerance and Variance Inflation Factor , 2005 .

[36]  C. Alonso,et al.  Deciphering non-alcoholic fatty liver disease through metabolomics. , 2014, Biochemical Society transactions.

[37]  M. Baker Metabolomics: from small molecules to big ideas , 2011, Nature Methods.

[38]  J. Bouyer,et al.  Individual differences in cognitive aging: implication of pregnenolone sulfate , 2003, Progress in Neurobiology.

[39]  José M. Mato,et al.  Data in support of enhancing metabolomics research through data mining , 2015, Data in brief.

[40]  D. Wishart,et al.  The food metabolome: a window over dietary exposure. , 2014, The American journal of clinical nutrition.

[41]  Charles Auffray,et al.  Application of ’omics technologies to biomarker discovery in inflammatory lung diseases , 2013, European Respiratory Journal.

[42]  G. Siuzdak,et al.  From exogenous to endogenous: the inevitable imprint of mass spectrometry in metabolomics. , 2007, Journal of proteome research.

[43]  R. A. van den Berg,et al.  Centering, scaling, and transformations: improving the biological information content of metabolomics data , 2006, BMC Genomics.

[44]  M. Mielke,et al.  Recent advances in the application of metabolomics to Alzheimer's Disease. , 2014, Biochimica et biophysica acta.

[45]  Shelly C. Lu,et al.  Obesity-dependent metabolic signatures associated with nonalcoholic fatty liver disease progression. , 2012, Journal of proteome research.

[46]  D. Wishart,et al.  Translational biomarker discovery in clinical metabolomics: an introductory tutorial , 2012, Metabolomics.

[47]  D. Bartlett,et al.  Understanding how we age: insights into inflammaging , 2013, Longevity & healthspan.

[48]  C. C. Taylor,et al.  Multivariate Analysis of Variance and Repeated Measures: A practical approach for behavioural scientists , 1988 .

[49]  C. Barbas,et al.  Metabolomics in cancer biomarker discovery: current trends and future perspectives. , 2014, Journal of pharmaceutical and biomedical analysis.

[50]  D. Cox,et al.  An Analysis of Transformations Revisited, Rebutted , 1982 .

[51]  S. Eckhardt,et al.  Clinical Applications of Metabolomics in Oncology: A Review , 2009, Clinical Cancer Research.

[52]  J. H. Lee,et al.  Metabolomic profiling as a useful tool for diagnosis and treatment of chronic disease: focus on obesity, diabetes and cardiovascular diseases , 2013, Expert review of cardiovascular therapy.

[53]  E. Ibáñez,et al.  Foodomics: MS-based strategies in modern food science and nutrition. , 2012, Mass spectrometry reviews.

[54]  T. Hartung,et al.  A novel in vitro metabolomics approach for neurotoxicity testing, proof of principle for methyl mercury chloride and caffeine. , 2008, Neurotoxicology.

[55]  D. Wishart Metabolomics: applications to food science and nutrition research , 2008 .

[56]  G. Box,et al.  A general distribution theory for a class of likelihood criteria. , 1949, Biometrika.

[57]  S. Wijmenga,et al.  NMR and pattern recognition methods in metabolomics: from data acquisition to biomarker discovery: a review. , 2012, Analytica chimica acta.

[58]  Lutgarde M. C. Buydens,et al.  Breaking with trends in pre-processing? , 2013 .