Bayesian Posterior Integration for Classification of Mass Spectrometry Data

High-throughput technologies currently have the capability to capture information at both global and targeted scales for the transcriptome, proteome, and metabolome, as well as determining functional aspects of these biomolecules. The promise of data integration is that by utilizing these disparate data streams a more accurate predictive model of the phenotype of interest can be developed by identifying the best subset of molecules associated with the outcome. However, in a space of tens of thousands of variables (e.g., genes, proteins), feature selection approaches often yield over-trained models with poor predictive power. Moreover, feature selection algorithms are typically focused on a single source of data and do not evaluate the effect on downstream statistical integration models. The integration of Bayesian statistical outputs have been shown to be an effective approach that optimizes the outcome of interest in the context of the integrated posterior probability. This chapter demonstrates that this approach can improve sensitivity and specificity over simple selection routines based on individual high-throughput datasets generated via mass spectrometry.

[1]  P. Bingley,et al.  Diabetes Antibody Standardization Program: first assay proficiency evaluation. , 2003, Diabetes.

[2]  Michael Schlosser,et al.  Diabetes antibody standardization program: first proficiency evaluation of assays for autoantibodies to zinc transporter 8. , 2011, Clinical chemistry.

[3]  Joel G. Pounds,et al.  Pacific Symposium on Biocomputing 14:451-463 (2009) A BAYESIAN INTEGRATION MODEL OF HIGH- THROUGHPUT PROTEOMICS AND METABOLOMICS DATA FOR IMPROVED EARLY DETECTION OF MICROBIAL INFECTIONS , 2022 .

[4]  Rainer Goebel,et al.  Combining multivariate voxel selection and support vector machines for mapping and classification of fMRI spatial patterns , 2008, NeuroImage.

[5]  Allen R. Tannenbaum,et al.  Recursive feature elimination for brain tumor classification using desorption electrospray ionization mass spectrometry imaging , 2012, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[6]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[7]  Q S Xu,et al.  A modified uncorrelated linear discriminant analysis model coupled with recursive feature elimination for the prediction of bioactivity , 2009, SAR and QSAR in environmental research.

[8]  Bart Devreese,et al.  A review on recent developments in mass spectrometry instrumentation and quantitative tools advancing bacterial proteomics , 2013, Applied Microbiology and Biotechnology.

[9]  Da-Wen Sun,et al.  Advances in Feature Selection Methods for Hyperspectral Image Processing in Food Industry Applications: A Review , 2015, Critical reviews in food science and nutrition.

[10]  Bobbie-Jo M. Webb-Robertson,et al.  VIBE 2.0: Visual Integration for Bayesian Evaluation , 2009, Bioinform..

[11]  Juexin Wang,et al.  Raman spectra exploring breast tissues: Comparison of principal component analysis and support vector machine-recursive feature elimination. , 2013, Medical physics.

[12]  Augustin Scalbert,et al.  Review of Mass Spectrometry–Based Metabolomics in Cancer Research , 2013, Cancer Epidemiology, Biomarkers & Prevention.

[13]  Alejandro Cifuentes,et al.  Metabolomics, peptidomics and proteomics applications of capillary electrophoresis-mass spectrometry in Foodomics: a review. , 2013, Analytica chimica acta.

[14]  Xiaowei Li,et al.  Prediction of protein structural class using tri-gram probabilities of position-specific scoring matrix and recursive feature elimination , 2015, Amino Acids.

[15]  Qibin Zhang,et al.  Serum proteomics reveals systemic dysregulation of innate immunity in type 1 diabetes , 2013, The Journal of experimental medicine.

[16]  Keun Ho Ryu,et al.  An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data , 2012, Bioinform..

[17]  Helen Kreuzer,et al.  Bayesian Integration of Isotope Ratio for Geographic Sourcing of Castor Beans , 2012, Journal of biomedicine & biotechnology.

[18]  C. P. Shapiro,et al.  Classification by Maximum Posterior Probability , 1977 .

[19]  Ivo Leito,et al.  Tutorial review on validation of liquid chromatography-mass spectrometry methods: part II. , 2015, Analytica chimica acta.

[20]  Enrique J. deAndrés-Galiana,et al.  Supervised Classification by Filter Methods and Recursive Feature Elimination Predicts Risk of Radiotherapy-Related Fatigue in Patients with Prostate Cancer , 2014, Cancer informatics.

[21]  Nabil Semmar,et al.  Review and research on feature selection methods from NMR data in biological fluids. Presentation of an original ensemble method applied to atherosclerosis field. , 2014, Current drug metabolism.

[22]  David J. Hand,et al.  Construction and Assessment of Classification Rules , 1997 .

[23]  Richard D Smith,et al.  Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. , 2015, Journal of proteome research.

[24]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[25]  Louise C. Showe,et al.  Recursive Cluster Elimination (RCE) for classification and feature selection from gene expression data , 2007, BMC Bioinformatics.

[26]  Heather A. Colburn,et al.  Bayesian-Integrated Microbial Forensics , 2008, Applied and Environmental Microbiology.

[27]  O Rolandsson,et al.  Prediction of diabetes with body mass index, oral glucose tolerance test and islet cell autoantibodies in a regional population , 2001, Journal of internal medicine.

[28]  Takahiro Hayasaka,et al.  MALDI Imaging Mass Spectrometry-A Mini Review of Methods and Recent Developments. , 2013, Mass spectrometry.

[29]  Xiaohui Lin,et al.  A support vector machine-recursive feature elimination feature selection method based on artificial contrast variables and mutual information. , 2012, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[30]  Hua-Can He,et al.  Decision by maximum of posterior probability average with weights: a method of multiple classifiers combination , 2005, 2005 International Conference on Machine Learning and Cybernetics.