DART-ID increases single-cell proteome coverage

Analysis by liquid chromatography and tandem mass spectrometry (LC-MS/MS) can iden-tify and quantify thousands of proteins in microgram-level samples, such as those comprised of thousands of cells. Identifying proteins by LC-MS/MS proteomics, however, remains chal-lenging for lowly abundant samples, such as the proteomes of single mammalian cells. To increase the identification rate of peptides in such small samples, we developed DART-ID. This method implements a data-driven, global retention time (RT) alignment process to infer peptide RTs across experiments. DART-ID then incorporates the global RT-estimates within a principled Bayesian framework to increase the confidence in correct peptide-spectrum-matches. Applying DART-ID to hundreds of samples prepared by the Single Cell Proteomics by Mass Spectrometry (SCoPE-MS) design increased the peptide and proteome coverage by 30 – 50% at 1% FDR. The newly identified peptides and proteins were further validated by demonstrating that their quantification is consistent with the quantification of peptides identified from high-quality spectra. DART-ID can be applied to various sets of experimen-tal designs with similar sample complexities and chromatography conditions, and is freely available online. Author Summary Identifying and quantifying proteins in single cells gives researchers the ability to tackle complex biological problems that involve single cell heterogeneity, such as the treatment of solid tumors. However, the mass spectra from analysis of single cells do not support sequence identification for all analyzed peptides. To improve identification rates, we utilize the retention time of peptide sequences from liquid chromatography – a process used before to separate peptides before their analysis with mass spectrometry. We present both a novel method of aligning the retention times of peptides across experiments, as well as a rigorous framework for using the estimated retention times to enhance peptide sequence identification. Incorporating the retention time as additional evidence leads to a substantial increase in the number of proteins that can be quantified and bio-logically analyzed by single-cell mass spectrometry.

[1]  Brendan MacLean,et al.  Bioinformatics Applications Note Gene Expression Skyline: an Open Source Document Editor for Creating and Analyzing Targeted Proteomics Experiments , 2022 .

[2]  M Daszykowski,et al.  A comparison of three algorithms for chromatograms alignment. , 2006, Journal of chromatography. A.

[3]  Gordon A Anderson,et al.  Use of artificial neural networks for the accurate prediction of peptide liquid chromatography elution times in proteome analyses. , 2003, Analytical chemistry.

[4]  J. Meek Prediction of peptide retention times in high-pressure liquid chromatography on the basis of amino acid composition. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Magnus Palmblad,et al.  Prediction of chromatographic retention and protein identification in liquid chromatography/mass spectrometry. , 2002, Analytical chemistry.

[6]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[7]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.

[8]  V. Spicer,et al.  Peptide retention standards and hydrophobicity indexes in reversed-phase high-performance liquid chromatography of peptides. , 2009, Analytical chemistry.

[9]  Edoardo M. Airoldi,et al.  Post-transcriptional regulation across human tissues , 2015, bioRxiv.

[10]  Lennart Martens,et al.  moFF: a robust and automated approach to extract peptide ion intensities , 2016, Nature Methods.

[11]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[12]  Brendan MacLean,et al.  ABRF Proteome Informatics Research Group (iPRG) 2015 Study: Detection of Differentially Abundant Proteins in Label-Free Quantitative LC-MS/MS Experiments. , 2017, Journal of proteome research.

[13]  C. Mant,et al.  Prediction of peptide retention times in reversed-phase high-performance liquid chromatography I. Determination of retention coefficients of amino acid residues of model synthetic peptides , 1986 .

[14]  N. Slavov,et al.  SCoPE-MS: mass spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation , 2017, Genome Biology.

[15]  Ruedi Aebersold,et al.  Spectronaut A fast and efficient algorithm for MRM-like processing of data independent acquisition (SWATH-MS) data , 2012 .

[16]  Jüergen Cox,et al.  The MaxQuant computational platform for mass spectrometry-based shotgun proteomics , 2016, Nature Protocols.

[17]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[18]  N. Slavov,et al.  SCoPE-MS: mass spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation , 2017, Genome Biology.

[19]  Richard D. Smith,et al.  Utility of accurate mass tags for proteome-wide protein identification. , 2000, Analytical chemistry.

[20]  Karl Mechtler,et al.  CharmeRT: Boosting Peptide Identifications by Chimeric Spectra Identification and Retention Time Prediction , 2018, Journal of proteome research.

[21]  T Sasagawa,et al.  Prediction of peptide retention times. , 1988, Journal of chromatography.

[22]  Innovations in Proteomics: The Drive to Single Cells. , 2018, Journal of proteome research.

[23]  Yvan Vander Heyden,et al.  Prediction of peptide retention at different HPLC conditions from multiple linear regression models. , 2005, Journal of proteome research.

[24]  Hendrik Weisser,et al.  Targeted Feature Detection for Data-Dependent Shotgun Proteomics , 2017, Journal of proteome research.

[25]  Francesca Blum,et al.  High performance liquid chromatography. , 2014, British journal of hospital medicine.

[26]  Lukas Käll,et al.  DeMix-Q: Quantification-Centered Data Processing Workflow* , 2016, Molecular & Cellular Proteomics.

[27]  M. Mann,et al.  A practical recipe for stable isotope labeling by amino acids in cell culture (SILAC) , 2006, Nature Protocols.

[28]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.

[29]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[30]  Dan Golick,et al.  Database searching and accounting of multiplexed precursor and product ion spectra from the data independent analysis of simple and complex peptide mixtures , 2009, Proteomics.

[31]  Joachim M. Buhmann,et al.  Semi-supervised LC/MS alignment for differential proteomics , 2006, ISMB.

[32]  Michael J MacCoss,et al.  Improving tandem mass spectrum identification using peptide retention time prediction across diverse chromatography conditions. , 2007, Analytical chemistry.

[33]  William Stafford Noble,et al.  Efficient marginalization to compute protein posterior probabilities from shotgun mass spectrometry data. , 2010, Journal of proteome research.

[34]  Richard Sparling,et al.  Information‐dependent LC‐MS/MS acquisition with exclusion lists potentially generated on‐the‐fly: Case study using a whole cell digest of Clostridium thermocellum , 2012, Proteomics.

[35]  Lukas Käll,et al.  Training, selection, and robust calibration of retention time models for targeted proteomics. , 2010, Journal of proteome research.

[36]  Kai Stühler,et al.  Retention time alignment algorithms for LC/MS data must consider non-linear shifts , 2009, Bioinform..

[37]  P. Clote,et al.  Fragmentation‐free LC‐MS can identify hundreds of proteins , 2011, Proteomics.

[38]  Magnus Palmblad,et al.  Protein identification by liquid chromatography-mass spectrometry using retention time prediction. , 2004, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[39]  Mikhail V Gorshkov,et al.  Liquid chromatography at critical conditions: comprehensive approach to sequence-dependent retention time prediction. , 2006, Analytical chemistry.

[40]  M. Mann,et al.  Andromeda: a peptide search engine integrated into the MaxQuant environment. , 2011, Journal of proteome research.

[41]  Steffen Neumann,et al.  Critical assessment of alignment procedures for LC-MS proteomics and metabolomics measurements , 2008, BMC Bioinformatics.

[42]  William Stafford Noble,et al.  Posterior error probabilities and false discovery rates: two sides of the same coin. , 2008, Journal of proteome research.

[43]  David H Perlman,et al.  Automated sample preparation for high-throughput single-cell proteomics , 2018, bioRxiv.

[44]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[45]  Rolf Apweiler,et al.  Faculty Opinions recommendation of Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2003 .

[46]  Pei Wang,et al.  Bioinformatics Original Paper a Suite of Algorithms for the Comprehensive Analysis of Complex Protein Mixtures Using High-resolution Lc-ms , 2022 .

[47]  R. Beavis,et al.  An Improved Model for Prediction of Retention Times of Tryptic Peptides in Ion Pair Reversed-phase HPLC , 2004, Molecular & Cellular Proteomics.

[48]  Yang Zhang,et al.  Locus-specific Retention Predictor (LsRP): A Peptide Retention Time Predictor Developed for Precision Proteomics , 2017, Scientific Reports.

[49]  N. Slavov,et al.  Single cell protein analysis for systems biology. , 2018, Essays in biochemistry.

[50]  Nikolai Slavov,et al.  Transformative Opportunities for Single-Cell Proteomics. , 2018, Journal of proteome research.

[51]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[52]  O. Krokhin,et al.  Sequence-specific retention calculator. Algorithm for peptide retention prediction in ion-pair RP-HPLC: application to 300- and 100-A pore size C18 sorbents. , 2006, Analytical chemistry.

[53]  Ben C. Collins,et al.  OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data , 2014, Nature Biotechnology.

[54]  Oliver Kohlbacher,et al.  Improving peptide identification in proteome analysis by a two-dimensional retention time filtering approach. , 2009, Journal of proteome research.

[55]  Ying Xu,et al.  Improved peptide elution time prediction for reversed-phase liquid chromatography-MS by incorporating peptide sequence information. , 2006, Analytical chemistry.

[56]  S. Neumann,et al.  PredRet: prediction of retention time by direct mapping between multiple chromatographic systems. , 2015, Analytical chemistry.

[57]  Marco Y. Hein,et al.  Accurate Proteome-wide Label-free Quantification by Delayed Normalization and Maximal Peptide Ratio Extraction, Termed MaxLFQ * , 2014, Molecular & Cellular Proteomics.

[58]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[59]  John Chilton,et al.  Using iRT, a normalized retention time for more targeted measurement of peptides , 2012, Proteomics.

[60]  Richard D. Smith,et al.  Application of peptide LC retention time information in a discriminant function for peptide identification by tandem mass spectrometry. , 2004, Journal of proteome research.

[61]  John P Cortens,et al.  Use of peptide retention time prediction for protein identification by off-line reversed-phase HPLC-MALDI MS/MS. , 2006, Analytical chemistry.

[62]  Oliver Kohlbacher,et al.  Statistical learning of peptide retention behavior in chromatographic separations: a new kernel-based approach for computational proteomics , 2007, BMC Bioinformatics.

[63]  Ramin Rad,et al.  Web-Based Search Tool for Visualizing Instrument Performance Using the Triple Knockout (TKO) Proteome Standard. , 2018, Journal of proteome research.

[64]  Dmitry Malioutov,et al.  Convex Total Least Squares , 2014, ICML.

[65]  Richard D. Smith,et al.  The Utility of Accurate Mass and LC Elution Time Information in the Analysis of Complex Proteomes , 2005, Journal of the American Society for Mass Spectrometry.

[66]  Roland Bruderer,et al.  High‐precision iRT prediction in the targeted analysis of data‐independent acquisition and its impact on identification and quantitation , 2016, Proteomics.

[67]  M. Gorenstein,et al.  Quantitative proteomic analysis by accurate mass retention time pairs. , 2005, Analytical chemistry.

[68]  Lukas Käll,et al.  Peptide retention time prediction. , 2017, Mass spectrometry reviews.