Wisdom of artificial crowds feature selection in untargeted metabolomics: An application to the development of a blood-based diagnostic test for thrombotic myocardial infarction

INTRODUCTION Heart disease remains a leading cause of global mortality. While acute myocardial infarction (colloquially: heart attack), has multiple proximate causes, proximate etiology cannot be determined by a blood-based diagnostic test. We enrolled a suitable patient cohort and conducted a non-targeted quantification of plasma metabolites by mass spectrometry for developing a test that can differentiate between thrombotic MI, non-thrombotic MI, and stable disease. A significant challenge in developing such a diagnostic test is solving the NP-hard problem of feature selection for constructing an optimal statistical classifier. OBJECTIVE We employed a Wisdom of Artificial Crowds (WoAC) strategy for solving the feature selection problem and evaluated the accuracy and parsimony of downstream classifiers in comparison with traditional feature selection techniques including the Lasso and selection using Random Forest variable importance criteria. MATERIALS AND METHODS Artificial Crowd Wisdom was generated via aggregation of the best solutions from independent and diverse genetic algorithm populations that were initialized with bootstrapping and a random subspaces constraint. RESULTS/CONCLUSIONS Strong evidence was observed that a statistical classifier utilizing WoAC feature selection can discriminate between human subjects presenting with thrombotic MI, non-thrombotic MI, and stable Coronary Artery Disease given abundances of selected plasma metabolites. Utilizing the abundances of twenty selected metabolites, a leave-one-out cross-validation estimated misclassification rate of 2.6% was observed. However, the WoAC feature selection strategy did not perform better than the Lasso over the current study.

[1]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[2]  Jean-Michel Poggi,et al.  Variable selection using random forests , 2010, Pattern Recognit. Lett..

[3]  J. Wikswo,et al.  Amino acids as metabolic substrates during cardiac ischemia , 2012, Experimental biology and medicine.

[4]  A. Jaffe,et al.  A Report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines , 2015 .

[5]  H. Brogren,et al.  Platelets synthesize large amounts of active plasminogen activator inhibitor 1. , 2004, Blood.

[6]  Glenn A Hirsch,et al.  Identification of a plasma metabolomic signature of thrombotic myocardial infarction that is distinct from non-thrombotic myocardial infarction and stable coronary artery disease , 2017, PloS one.

[7]  C. Sáez,et al.  Human platelets synthesize and express functional tissue factor. , 2007, Blood.

[8]  Michael D. Lee,et al.  The Wisdom of the Crowd in Combinatorial Problems , 2012, Cogn. Sci..

[9]  T. Aikawa,et al.  Platelet-activating factor acts on cortisol secretion by perfused guinea-pig adrenals via calcium-/phospholipid-dependent mechanisms. , 2005, The Journal of endocrinology.

[10]  Roman V. Yampolskiy,et al.  Wisdom of artificial crowds algorithm for solving NP-hard problems , 2011, Int. J. Bio Inspired Comput..

[11]  Mario J. Garcia,et al.  Expert Consensus Document on Practical Clinical Considerations in the Interpretation of Troponin Elevations , 2012 .

[12]  A. Jaffe,et al.  Copeptin helps in the early detection of patients with acute myocardial infarction: primary results of the CHOPIN trial (Copeptin Helps in the early detection Of Patients with acute myocardial INfarction). , 2013, Journal of the American College of Cardiology.

[13]  R. Lequin Enzyme immunoassay (EIA)/enzyme-linked immunosorbent assay (ELISA). , 2005, Clinical chemistry.

[14]  S. Lévy,et al.  Hypothalamo-pituitary-adrenal axis in acute myocardial infarction treated by percutaneous transluminal coronary angioplasty: Effect of time of presentation , 2003, Journal of endocrinological investigation.

[15]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[16]  Samantha M. Carlisle,et al.  Systems characterization of differential plasma metabolome perturbations following thrombotic and non-thrombotic myocardial infarction. , 2017, Journal of proteomics.

[17]  Fred S Apple,et al.  Third universal definition of myocardial infarction , 2012 .

[18]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[19]  Ara W. Darzi,et al.  Metabolic phenotyping in clinical and surgical environments , 2012, Nature.

[20]  B. McManus,et al.  The Human Serum Metabolome , 2011, PloS one.

[21]  Y. Tsuji,et al.  Effect of platelet-activating factor on cortisol and corticosterone secretion by perfused dog adrenal , 1991, Lipids.

[22]  S. Rai,et al.  Circulating levels of plasminogen and oxidized phospholipids bound to plasminogen distinguish between atherothrombotic and non-atherothrombotic myocardial infarction , 2016, Journal of Thrombosis and Thrombolysis.

[23]  T. Nomura,et al.  An analysis on linear crossover for real number chromosomes in an infinite population size , 1997, Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC '97).

[24]  Mark D. Huffman,et al.  Heart Disease and Stroke Statistics—2016 Update: A Report From the American Heart Association , 2016, Circulation.

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[27]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[28]  R. Leardi,et al.  Genetic algorithms applied to feature selection in PLS regression: how and when to use them , 1998 .

[29]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[30]  Douglas B. Kell,et al.  Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry , 1997 .