Topological and kernel-based microbial phenotype prediction from MALDI-TOF mass spectra

Abstract Motivation Microbial species identification based on matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry (MS) has become a standard tool in clinical microbiology. The resulting MALDI-TOF mass spectra also harbour the potential to deliver prediction results for other phenotypes, such as antibiotic resistance. However, the development of machine learning algorithms specifically tailored to MALDI-TOF MS-based phenotype prediction is still in its infancy. Moreover, current spectral pre-processing typically involves a parameter-heavy chain of operations without analyzing their influence on the prediction results. In addition, classification algorithms lack quantification of uncertainty, which is indispensable for predictions potentially influencing patient treatment. Results We present a novel prediction method for antimicrobial resistance based on MALDI-TOF mass spectra. First, we compare the complex conventional pre-processing to a new approach that exploits topological information and requires only a single parameter, namely the number of peaks of a spectrum to keep. Second, we introduce PIKE, the peak information kernel, a similarity measure specifically tailored to MALDI-TOF mass spectra which, combined with a Gaussian process classifier, provides well-calibrated uncertainty estimates about predictions. We demonstrate the utility of our approach by predicting antibiotic resistance of three clinically highly relevant bacterial species. Our method consistently outperforms competitor approaches, while demonstrating improved performance and security by rejecting out-of-distribution samples, such as bacterial species that are not represented in the training data. Ultimately, our method could contribute to an earlier and precise antimicrobial treatment in clinical patient care. Availability and implementation We make our code publicly available as an easy-to-use Python package under https://github.com/BorgwardtLab/maldi_PIKE.

[1]  Karsten M. Borgwardt,et al.  Kernel Methods in Bioinformatics , 2011, Handbook of Statistical Bioinformatics.

[2]  David Cohen-Steiner,et al.  Stability of Persistence Diagrams , 2007, Discret. Comput. Geom..

[3]  C. Tse,et al.  Rapid detection of cfiA metallo-β-lactamase-producing Bacteroides fragilis by the combination of MALDI-TOF MS and CarbaNP , 2017, Journal of Clinical Pathology.

[4]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[5]  Juho Rousu,et al.  Metabolite identification and molecular fingerprint prediction through machine learning , 2012, Bioinform..

[6]  J. Roe Elliptic Operators, Topology and Asymptotic Methods , 1988 .

[7]  Juho Rousu,et al.  Fast metabolite identification with Input Output Kernel Regression , 2016, Bioinform..

[8]  Herbert Edelsbrunner,et al.  Computational Topology - an Introduction , 2009 .

[9]  F. Nomura,et al.  Rapid Discrimination between Methicillin-Sensitive and Methicillin-Resistant Staphylococcus aureus Using MALDI-TOF Mass Spectrometry. , 2017, Biocontrol science.

[10]  K. Mertens,et al.  Attributable deaths and disability-adjusted life-years caused by infections with antibiotic-resistant bacteria in the EU and the European Economic Area in 2015: a population-level modelling analysis , 2019, The Lancet. Infectious diseases.

[11]  Søren Hauberg,et al.  Geodesic exponential kernels: When curvature and linearity conflict , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  E. Cambau,et al.  Classification Algorithm for Subspecies Identification within the Mycobacterium abscessus Species, Based on Matrix-Assisted Laser Desorption Ionization–Time of Flight Mass Spectrometry , 2014, Journal of Clinical Microbiology.

[13]  S. Böcker,et al.  Searching molecular structure databases with tandem mass spectra using CSI:FingerID , 2015, Proceedings of the National Academy of Sciences.

[14]  S. Butler-Wu,et al.  Rapid Detection of Vancomycin-Intermediate Staphylococcus aureus by Matrix-Assisted Laser Desorption Ionization–Time of Flight Mass Spectrometry , 2016, Journal of Clinical Microbiology.

[15]  Tzong-Yi Lee,et al.  A new scheme for strain typing of methicillin-resistant Staphylococcus aureus on the basis of matrix-assisted laser desorption ionization time-of-flight mass spectrometry by using machine learning approach , 2018, PloS one.

[16]  Ulrich Bauer,et al.  A stable multi-scale kernel for topological machine learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[18]  Juho Rousu,et al.  Metabolite identification through multiple kernel learning on fragmentation trees , 2014, Bioinform..

[19]  P. Woo,et al.  Use of MALDI Biotyper plus ClinProTools mass spectra analysis for correct identification of Streptococcus pneumoniae and Streptococcus mitis/oralis , 2015, Journal of Clinical Pathology.

[20]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[21]  Caroline Weis,et al.  Machine learning for microbial identification and antimicrobial susceptibility testing on MALDI-TOF mass spectra: a systematic review. , 2020, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[22]  Jean-Philippe Vert,et al.  Benchmark of structured machine learning methods for microbial identification from mass-spectrometry data , 2015, ArXiv.

[23]  Willem Waegeman,et al.  Investigating Time Series Classification Techniques for Rapid Pathogen Identification with Single-Cell MALDI-TOF Mass Spectrum Data , 2019, BNAIC/BENELEARN.

[24]  Roe Elliptic operators, topology and asymptotic methods , 1990 .

[25]  Bernhard Schölkopf,et al.  Handbook of Statistical Bioinformatics , 2011 .

[26]  Xiang Zhan,et al.  Kernel approaches for differential expression analysis of mass spectrometry-based metabolomics data , 2015, BMC Bioinformatics.

[27]  Sebastian Gibb,et al.  MALDIquant: a versatile R package for the analysis of mass spectrometry data , 2012, Bioinform..

[28]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[29]  Willem Waegeman,et al.  Bacterial species identification from MALDI-TOF mass spectra through data analysis and machine learning. , 2011, Systematic and applied microbiology.