Machine Learning in Untargeted Metabolomics Experiments.

Machine learning is a form of artificial intelligence (AI) that provides computers with the ability to learn generally without being explicitly programmed. Machine learning refers to the ability of computer programs to adapt when exposed to new data. Here we examine the use of machine learning for use with untargeted metabolomics data, when it is appropriate to use, and questions it can answer. We provide an example workflow for training and testing a simple binary classifier, a multiclass classifier and a support vector machine using the Waikato Environment for Knowledge Analysis (Weka), a toolkit for machine learning. This workflow should provide a framework for greater integration of machine learning with metabolomics study.

[1]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[2]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[3]  Alexander G. Gray,et al.  Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines , 2009, BMC Bioinformatics.

[4]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[5]  Steffen Neumann,et al.  Highly sensitive feature detection for high resolution LC/MS , 2008, BMC Bioinformatics.

[6]  L. Tenori,et al.  The metabonomic signature of celiac disease. , 2009, Journal of proteome research.

[7]  R. Abagyan,et al.  METLIN: A Metabolite Mass Spectral Database , 2005, Therapeutic drug monitoring.

[8]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[9]  Kenneth A. De Jong,et al.  Genetic algorithms as a tool for feature selection in machine learning , 1992, Proceedings Fourth International Conference on Tools with Artificial Intelligence TAI '92.

[10]  L. Mcquitty Similarity Analysis by Reciprocal Pairs for Discrete and Continuous Data , 1966 .

[11]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[12]  Fionn Murtagh,et al.  Multidimensional clustering algorithms , 1985 .

[13]  Brian Everitt,et al.  Cluster analysis , 1974 .

[14]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[15]  T. Ebbels,et al.  Optimized preprocessing of ultra-performance liquid chromatography/mass spectrometry urinary metabolic profiles for improved information recovery. , 2011, Analytical chemistry.

[16]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[17]  Aurélien Mazurie,et al.  Application of support vector machines to metabolomics experiments with limited replicates , 2014, Metabolomics.

[18]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[19]  Liang Tang,et al.  A method for handling metabonomics data from liquid chromatography/mass spectrometry: combinational use of support vector machine recursive feature elimination, genetic algorithm and random forest for feature selection , 2011, Metabolomics.

[20]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[21]  Gwen Littlewort,et al.  Machine learning methods for fully automatic recognition of facial expressions and facial actions , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[22]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[23]  F. Azuaje,et al.  Multiple SVM-RFE for gene selection in cancer classification with expression data , 2005, IEEE Transactions on NanoBioscience.

[24]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[25]  G. Siuzdak,et al.  Expanding coverage of the metabolome for global metabolite profiling. , 2011, Analytical chemistry.