DtaRefinery, a Software Tool for Elimination of Systematic Errors from Parent Ion Mass Measurements in Tandem Mass Spectra Data Sets*

Hybrid two-stage mass spectrometers capable of both highly accurate mass measurement and high throughput MS/MS fragmentation have become widely available in recent years, allowing for significantly better discrimination between true and false MS/MS peptide identifications by the application of a relatively narrow window for maximum allowable deviations of measured parent ion masses. To fully gain the advantage of highly accurate parent ion mass measurements, it is important to limit systematic mass measurement errors. Based on our previous studies of systematic biases in mass measurement errors, here, we have designed an algorithm and software tool that eliminates the systematic errors from the peptide ion masses in MS/MS data. We demonstrate that the elimination of the systematic mass measurement errors allows for the use of tighter criteria on the deviation of measured mass from theoretical monoisotopic peptide mass, resulting in a reduction of both false discovery and false negative rates of peptide identification. A software implementation of this algorithm called DtaRefinery reads a set of fragmentation spectra, searches for MS/MS peptide identifications using a FASTA file containing expected protein sequences, fits a regression model that can estimate systematic errors, and then corrects the parent ion mass entries by removing the estimated systematic error components. The output is a new file with fragmentation spectra with updated parent ion masses. The software is freely available.

[1]  M. Clench,et al.  Exact mass determination of narrow electrophoretic peaks using an orthogonal acceleration time‐of‐flight mass spectrometer , 1999 .

[2]  M. Mann,et al.  Trypsin Cleaves Exclusively C-terminal to Arginine and Lysine Residues*S , 2004, Molecular & Cellular Proteomics.

[3]  Navdeep Jaitly,et al.  DeconMSn: a software tool for accurate parent ion monoisotopic mass determination for tandem mass spectra , 2008, Bioinform..

[4]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[5]  J. Yates,et al.  Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. , 1995, Analytical chemistry.

[6]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[7]  Peter R. Baker,et al.  Role of accurate mass measurement (+/- 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. , 1999, Analytical chemistry.

[8]  Richard D. Smith,et al.  Automated gain control and internal calibration with external ion accumulation capillary liquid chromatography-electrospray ionization Fourier transform ion cyclotron resonance. , 2003, Analytical chemistry.

[9]  Yi-Kuo Yu,et al.  Detection of co-eluted peptides using database search methods , 2008, Biology Direct.

[10]  J. A. Falkner,et al.  ProteomeCommons.org IO Framework: reading and writing multiple proteomics data formats , 2007, Bioinform..

[11]  Tony W. T. Bristow,et al.  Improved precision and accuracy for high-performance liquid chromatography/Fourier transform ion cyclotron resonance mass spectrometric exact mass measurement of small molecules from the simultaneous and controlled introduction of internal calibrants via a second electrospray nebuliser. , 2004, Rapid communications in mass spectrometry : RCM.

[12]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[13]  Jürgen Cox,et al.  Computational principles of determining and improving mass precision and accuracy for proteome measurements in an Orbitrap , 2009, Journal of the American Society for Mass Spectrometry.

[14]  Ronald J Moore,et al.  Characterization of the mouse brain proteome using global proteomic analysis complemented with cysteinyl-peptide enrichment. , 2006, Journal of proteome research.

[15]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[16]  R. Beavis,et al.  A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. , 2003, Analytical chemistry.

[17]  P. Højrup,et al.  VEMS 3.0: algorithms and computational tools for tandem mass spectrometry based identification of post-translational modifications in proteins. , 2005, Journal of proteome research.

[18]  M. Mann,et al.  Solid tumor proteome and phosphoproteome analysis by high resolution mass spectrometry. , 2008, Journal of proteome research.

[19]  Richard D. Smith,et al.  Mass measurement accuracy in analyses of highly complex mixtures based upon multidimensional recalibration. , 2006, Analytical chemistry.

[20]  Yun-ping Zhu,et al.  Mass measurement errors of Fourier-transform mass spectrometry (FTMS): distribution, recalibration, and application. , 2009, Journal of proteome research.

[21]  Magnus Palmblad,et al.  Automatic internal calibration in liquid chromatography/Fourier transform ion cyclotron resonance mass spectrometry of protein digests. , 2006, Rapid communications in mass spectrometry : RCM.

[22]  William Stafford Noble,et al.  A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. , 2003, Journal of proteome research.

[23]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[24]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[25]  M. Mann,et al.  Parts per Million Mass Accuracy on an Orbitrap Mass Spectrometer via Lock Mass Injection into a C-trap*S , 2005, Molecular & Cellular Proteomics.

[26]  Guilong Cheng,et al.  Mass spectrometry of peptides and proteins. , 2005, Methods.

[27]  Data self-recalibration and mixture mass fingerprint searching (DASER-MMF) to enhance protein identification within complex mixtures , 2008, Journal of the American Society for Mass Spectrometry.

[28]  W. Härdle Applied Nonparametric Regression , 1992 .

[29]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[30]  S. Gygi,et al.  Large-scale identification and evolution indexing of tyrosine phosphorylation sites from murine brain. , 2008, Journal of proteome research.

[31]  M. Mann,et al.  On the Proper Use of Mass Accuracy in Proteomics* , 2007, Molecular & Cellular Proteomics.

[32]  Hee-Jung Jung,et al.  Postexperiment Monoisotopic Mass Filtering and Refinement (PE-MMR) of Tandem Mass Spectrometric Data Increases Accuracy of Peptide Identification in LC/MS/MS*S , 2008, Molecular & Cellular Proteomics.

[33]  Brian Carrillo,et al.  Multicomponent internal recalibration of an LC-FTICR-MS analysis employing a partially characterized complex peptide mixture: systematic and random errors. , 2005, Analytical chemistry.

[34]  Alan R. Dabney,et al.  Elimination of systematic mass measurement errors in liquid chromatography-mass spectrometry based proteomics using regression models and a priori partial knowledge of the sample content. , 2008, Analytical chemistry.

[35]  Hua Lin,et al.  Nonparametric mass calibration using hundreds of internal calibrants. , 2007, Analytical chemistry.

[36]  Brendan K Faherty,et al.  Optimization and Use of Peptide Mass Measurement Accuracy in Shotgun Proteomics*S , 2006, Molecular & Cellular Proteomics.