Isotope pattern deconvolution for peptide mass spectrometry by non-negative least squares/least absolute deviation template matching

BackgroundThe robust identification of isotope patterns originating from peptides being analyzed through mass spectrometry (MS) is often significantly hampered by noise artifacts and the interference of overlapping patterns arising e.g. from post-translational modifications. As the classification of the recorded data points into either ‘noise’ or ‘signal’ lies at the very root of essentially every proteomic application, the quality of the automated processing of mass spectra can significantly influence the way the data might be interpreted within a given biological context.ResultsWe propose non-negative least squares/non-negative least absolute deviation regression to fit a raw spectrum by templates imitating isotope patterns. In a carefully designed validation scheme, we show that the method exhibits excellent performance in pattern picking. It is demonstrated that the method is able to disentangle complicated overlaps of patterns.ConclusionsWe find that regularization is not necessary to prevent overfitting and that thresholding is an effective and user-friendly way to perform feature selection. The proposed method avoids problems inherent in regularization-based approaches, comes with a set of well-interpretable parameters whose default configuration is shown to generalize well without the need for fine-tuning, and is applicable to spectra of different platforms. The R package IPPD implements the method and is available from the Bioconductor platform (http://bioconductor.fhcrc.org/help/bioc-views/devel/bioc/html/IPPD.html).

[1]  Karin Noy,et al.  Improved model-based, platform-independent feature extraction for mass spectrometry , 2007, Bioinform..

[2]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[3]  Matthias Hein,et al.  Sparse recovery by thresholded non-negative least squares , 2011, NIPS.

[4]  Bernhard Y. Renard,et al.  NITPICK: peak identification for mass spectrometry data , 2008, BMC Bioinformatics.

[5]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[6]  Dante Mantini,et al.  LIMPIC: a computational method for the separation of protein MALDI-TOF-MS signals from noise , 2007, BMC Bioinformatics.

[7]  Knut Reinert,et al.  A Fast and Accurate Algorithm for the Quantification of Peptides from Mass Spectrometry Data , 2007, RECOMB.

[8]  Andreas Hildebrandt,et al.  Efficient Analysis of Mass Spectrometry Data Using the Isotope Wavelet , 2008 .

[9]  Peicheng Du,et al.  Automatic deconvolution of isotope-resolved mass spectra using variable selection and quantized peptide mass distribution. , 2006, Analytical chemistry.

[10]  Ao Tang,et al.  Conditions for a unique non-negative solution to an underdetermined system , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[11]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[12]  David L. Donoho,et al.  Counting the Faces of Randomly-Projected Hypercubes and Orthants, with Applications , 2008, Discret. Comput. Geom..

[13]  Jianqing Fan,et al.  Variance estimation using refitted cross‐validation in ultrahigh dimensional regression , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[14]  Jerzy Tiuryn,et al.  Automated reduction and interpretation of multidimensional mass spectra for analysis of complex peptide mixtures , 2007 .

[15]  Knut Reinert,et al.  Computational Quantification of Peptides from LC-MS Data , 2008, J. Comput. Biol..

[16]  Peter Bühlmann Regression shrinkage and selection via the Lasso: a retrospective (Robert Tibshirani): Comments on the presentation , 2011 .

[17]  F. McLafferty,et al.  Automated reduction and interpretation of , 2000, Journal of the American Society for Mass Spectrometry.

[18]  M. MacCoss,et al.  High-speed data reduction, feature detection, and MS/MS spectrum quality assessment of shotgun proteomics data sets using high-resolution mass spectrometry. , 2007, Analytical chemistry.

[19]  Peter B. O’Connor,et al.  Algorithms for automatic interpretation of high resolution mass spectra , 2006, Journal of the American Society for Mass Spectrometry.

[20]  Knut Reinert,et al.  High-Accuracy Peak Picking of Proteomics Data Using Wavelet Techniques , 2005, Pacific Symposium on Biocomputing.

[21]  Song Li,et al.  WaveletQuant, an improved quantification software based on wavelet signal threshold de-noising for labeled quantitative proteomic analysis , 2010, BMC Bioinformatics.

[22]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[23]  Knut Reinert,et al.  Analytical model of peptide mass cluster centres with applications , 2006, Proteome Science.

[24]  Frank Suits,et al.  A noise model for mass spectrometry based proteomics , 2008, Bioinform..

[25]  Ullrich Köthe,et al.  Deuteration distribution estimation with improved sequence coverage for HX/MS experiments , 2010, Bioinform..

[26]  M. Senko,et al.  Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions , 1995, Journal of the American Society for Mass Spectrometry.

[27]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[28]  L. Wasserman,et al.  HIGH DIMENSIONAL VARIABLE SELECTION. , 2007, Annals of statistics.

[29]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[30]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[31]  F. McLafferty,et al.  High-resolution electrospray mass spectra of large molecules , 1991 .

[32]  Roman A. Zubarev,et al.  Accurate Monoisotopic Mass Measurements of Peptides: Possibilities and Limitations of High Resolution Time-of-flight Particle Desorption Mass Spectrometry , 1996 .

[33]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[34]  YANQING CHEN,et al.  Algorithm 8 xx : CHOLMOD , supernodal sparse Cholesky factorization and update / downdate ∗ , 2006 .

[35]  D. Madigan,et al.  [Least Angle Regression]: Discussion , 2004 .

[36]  D. Aswad,et al.  Deamidation and isoaspartate formation in proteins: unwanted alterations or surreptitious signals? , 2003, Cellular and Molecular Life Sciences CMLS.

[37]  Fredrik Levander,et al.  Modular, scriptable and automated analysis tools for high-throughput peptide mass fingerprinting , 2004, Bioinform..

[38]  Michael Elad,et al.  On the Uniqueness of Nonnegative Sparse Solutions to Underdetermined Systems of Equations , 2008, IEEE Transactions on Information Theory.

[39]  Frank Suits,et al.  Threshold-avoiding proteomics pipeline. , 2011, Analytical chemistry.

[40]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[41]  Tong Zhang Some sharp performance bounds for least squares regression with L1 regularization , 2009, 0908.2869.

[42]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[43]  Shuheng Zhou,et al.  Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation , 2009, NIPS.

[44]  N. Meinshausen Sign-constrained least squares estimation for high-dimensional regression , 2012, 1202.0889.

[45]  A. Tholey,et al.  Influence of myristoylation, phosphorylation, and deamidation on the structural behavior of the N-terminus of the catalytic subunit of cAMP-dependent protein kinase. , 2001, Biochemistry.

[46]  Andreas Hildebrandt,et al.  Highly accelerated feature detection in proteomics data sets using modern graphics processing units , 2009, Bioinform..

[47]  P. Pevzner,et al.  Deconvolution and Database Search of Complex Tandem Mass Spectra of Intact Proteins , 2010, Molecular & Cellular Proteomics.