Investigating Time Series Classification Techniques for Rapid Pathogen Identification with Single-Cell MALDI-TOF Mass Spectrum Data

Matrix-assisted laser desorption/ionization-time-of-flight mass spectrometry (MALDI-TOF-MS) is a well-known technology, widely used in species identification. Specifically, MALDI-TOF-MS is applied on samples that usually include bacterial cells, generating representative signals for the various bacterial species. However, for a reliable identification result, a significant amount of biomass is required. For most samples used for diagnostics of infectious diseases, the sample volume is extremely low to obtain the required amount of biomass. Therefore, amplification of the bacterial load is performed by a culturing phase. If the MALDI process could be applied to individual bacteria, it would be possible to circumvent the need for culturing and isolation, accelerating the whole process. In this paper, we briefly describe an implementation of a MALDI-TOF MS procedure in a setting of individual cells and we demonstrate the use of the produced data for the application of pathogen identification. The identification of pathogens (bacterial species) is performed by using machine learning algorithms on the generated single-cell signals. The high predictive performance of the machine learning models indicates that the produced bacterial signatures constitute an informative representation, helpful in distinguishing the different bacterial species. In addition, we reformulate the bacterial species identification problem as a time series classification task by considering the intensity sequences of a given spectrum as time series values. Experimental results show that algorithms originally introduced for time series analysis are beneficial in modelling observations of single-cell MALDI-TOF MS.

[1]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[2]  L. Qiao,et al.  Direct MALDI-TOF MS Identification of Bacterial Mixtures. , 2018, Analytical chemistry.

[3]  George C. Runger,et al.  A time series forest for classification and feature extraction , 2013, Inf. Sci..

[4]  W. Dunne,et al.  Progress in proteomics for clinical microbiology: MALDI-TOF MS for microbial species identification and more , 2015, Expert review of proteomics.

[5]  Patrick Schäfer The BOSS is concerned with time series classification in the presence of noise , 2014, Data Mining and Knowledge Discovery.

[6]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[7]  Willem Waegeman,et al.  Bacterial species identification from MALDI-TOF mass spectra through data analysis and machine learning. , 2011, Systematic and applied microbiology.

[8]  A. L. van Wuijckhuijse,et al.  Matrix-assisted laser desorption/ionisation aerosol time-of-flight mass spectrometry for the analysis of bioaerosols: development of a fast detector for airborne biological pathogens , 2005 .

[9]  Eamonn J. Keogh,et al.  CID: an efficient complexity-invariant distance for time series , 2013, Data Mining and Knowledge Discovery.

[10]  R. Zengerle,et al.  Inkjet-like printing of single-cells. , 2011, Lab on a chip.

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  Ali Bashashati,et al.  A Survey of Flow Cytometry Data Analysis Methods , 2009, Adv. Bioinformatics.

[13]  Margarita Osadchy,et al.  Deep Convolutional Neural Networks for Raman Spectrum Recognition: A Unified Solution , 2017, The Analyst.

[14]  Frank-Michael Schleif,et al.  Support vector classification of proteomic profile spectra based on feature extraction with the bi-orthogonal discrete wavelet transform , 2009 .

[15]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[16]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[17]  K. Baumann,et al.  Gaussian mixture discriminant analysis for the single-cell differentiation of bacteria using micro-Raman spectroscopy , 2009 .

[18]  D. Cox The Regression Analysis of Binary Sequences , 1958 .

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  R. Knochenmuss The Coupled Chemical and Physical Dynamics Model of MALDI. , 2016, Annual review of analytical chemistry.

[21]  George Pavlidis,et al.  Effective Raman spectra identification with tree-based methods , 2019, Journal of Cultural Heritage.

[22]  Sen-Yung Hsieh,et al.  Highly Efficient Classification and Identification of Human Pathogenic Bacteria by MALDI-TOF MS*S , 2008, Molecular & Cellular Proteomics.

[23]  Gilbert GREUB,et al.  Applications of MALDI-TOF mass spectrometry in clinical diagnostic microbiology. , 2012, FEMS microbiology reviews.

[24]  Eamonn J. Keogh,et al.  The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances , 2016, Data Mining and Knowledge Discovery.

[25]  W. Waegeman,et al.  Flow Cytometric Single-Cell Identification of Populations in Synthetic Bacterial Communities , 2017, PloS one.

[26]  Bruno Lacroix,et al.  Automatic identification of mixed bacterial species fingerprints in a MALDI-TOF mass-spectrum , 2014, Bioinform..