Computational quality control tools for mass spectrometry proteomics

As mass‐spectrometry‐based proteomics has matured during the past decade, a growing emphasis has been placed on quality control. For this purpose, multiple computational quality control tools have been introduced. These tools generate a set of metrics that can be used to assess the quality of a mass spectrometry experiment. Here we review which types of quality control metrics can be generated, and how they can be used to monitor both intra‐ and inter‐experiment performances. We discuss the principal computational tools for quality control and list their main characteristics and applicability. As most of these tools have specific use cases, it is not straightforward to compare their performances. For this survey, we used different sets of quality control metrics derived from information at various stages in a mass spectrometry process and evaluated their effectiveness at capturing qualitative information about an experiment using a supervised learning approach. Furthermore, we discuss currently available algorithmic solutions that enable the usage of these quality control metrics for decision‐making.

[1]  Karl Mechtler,et al.  SIMPATIQCO: A Server-Based Software Suite Which Facilitates Monitoring the Time Course of LC–MS Performance Metrics on Orbitrap Instruments , 2012, Journal of proteome research.

[2]  Matthias Mann,et al.  SprayQc: a real-time LC-MS/MS quality monitoring system to maximize uptime using off the shelf components. , 2012, Journal of proteome research.

[3]  Lennart Martens,et al.  Unsupervised Quality Assessment of Mass Spectrometry Proteomics Experiments by Multivariate Quality Control Metrics. , 2016, Journal of proteome research.

[4]  Knut Reinert,et al.  Workflows for automated downstream data analysis and visualization in large-scale computational mass spectrometry , 2015, Proteomics.

[5]  Karl Mechtler,et al.  Quality control in LC‐MS/MS , 2011, Proteomics.

[6]  Richard D. Smith,et al.  Recommendations for mass spectrometry data quality metrics for open access data (corollary to the Amsterdam principles) , 2012, Proteomics.

[7]  Lennart Martens,et al.  Bringing proteomics into the clinic: The need for the field to finally take itself seriously , 2013, Proteomics. Clinical applications.

[8]  Brendan MacLean,et al.  Panorama: A Targeted Proteomics Knowledge Base , 2014, Journal of proteome research.

[9]  Brian L. LaMarche,et al.  Signatures for Mass Spectrometry Data Quality , 2014, Journal of proteome research.

[10]  Lorenzo J. Vega-Montoto,et al.  QuaMeter: multivendor performance metrics for LC-MS/MS proteomics instrumentation. , 2012, Analytical chemistry.

[11]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[12]  Chris Bielow,et al.  Proteomics Quality Control: Quality Control Software for MaxQuant Results. , 2016, Journal of proteome research.

[13]  Jeffrey R. Whiteaker,et al.  Proteogenomic characterization of human colon and rectal cancer , 2014, Nature.

[14]  David L. Tabb,et al.  Reproducibility of Differential Proteomic Technologies in CPTAC Fractionated Xenografts , 2015, Journal of proteome research.

[15]  N. Kelleher,et al.  Progress in Top-Down Proteomics and the Analysis of Proteoforms. , 2016, Annual review of analytical chemistry.

[16]  Laurent Gatto,et al.  Using R and Bioconductor for proteomics data analysis. , 2013, Biochimica et biophysica acta.

[17]  David L. Tabb,et al.  QC Metrics from CPTAC Raw LC-MS/MS Data Interpreted through Multivariate Statistics , 2014, Analytical chemistry.

[18]  David L Tabb,et al.  Quality assessment for clinical proteomics. , 2013, Clinical biochemistry.

[19]  Michael D. Litton,et al.  IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering. , 2009, Journal of proteome research.

[20]  Michael S Bereman,et al.  Tools for monitoring system suitability in LC MS/MS centric proteomic experiments , 2015, Proteomics.

[21]  Lennart Martens,et al.  Designing biomedical proteomics experiments: state-of-the-art and future perspectives , 2016, Expert review of proteomics.

[22]  Michael J MacCoss,et al.  Multiplexed peptide analysis using data-independent acquisition and Skyline , 2015, Nature Protocols.

[23]  Knut Reinert,et al.  TOPPAS: a graphical workflow editor for the analysis of high-throughput proteomics data. , 2012, Journal of proteome research.

[24]  D. Tabb,et al.  MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. , 2007, Journal of proteome research.

[25]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[26]  Lennart Martens,et al.  qcML: An Exchange Format for Quality Control Metrics from Mass Spectrometry Experiments , 2014, Molecular & Cellular Proteomics.

[27]  Birgit Schilling,et al.  Interlaboratory Study Characterizing a Yeast Performance Standard for Benchmarking LC-MS Platform Performance* , 2009, Molecular & Cellular Proteomics.

[28]  Brendan MacLean,et al.  A framework for installable external tools in Skyline , 2014, Bioinform..

[29]  Pavel A. Pevzner,et al.  Universal database search tool for proteomics , 2014, Nature Communications.

[30]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[31]  Thorsten Meinl,et al.  KNIME - the Konstanz information miner: version 2.0 and beyond , 2009, SKDD.

[32]  Lennart Martens,et al.  iMonDB: Mass Spectrometry Quality Control through Instrument Monitoring. , 2015, Journal of proteome research.

[33]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[34]  Lennart Martens,et al.  A posteriori quality control for the curation and reuse of public proteomics data , 2011, Proteomics.

[35]  John D. Venable,et al.  MS1, MS2, and SQT-three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications. , 2004, Rapid communications in mass spectrometry : RCM.

[36]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[37]  Alfonso Valencia,et al.  The potential clinical impact of the release of two drafts of the human proteome , 2015, Expert review of proteomics.

[38]  Marco Y. Hein,et al.  Accurate Proteome-wide Label-free Quantification by Delayed Normalization and Maximal Peptide Ratio Extraction, Termed MaxLFQ * , 2014, Molecular & Cellular Proteomics.

[39]  Lennart Martens Public proteomics data: How the field has evolved from sceptical inquiry to the promise of in silico proteomics☆ , 2016, EuPA open proteomics.

[40]  David L. Tabb,et al.  Proteomic analysis of colon and rectal carcinoma using standard and customized databases , 2015, Scientific Data.

[41]  Michael J Sweredoski,et al.  LogViewer: a software tool to visualize quality control parameters to optimize proteomics experiments using Orbitrap and LTQ-FT mass spectrometers. , 2011, Journal of biomolecular techniques : JBT.

[42]  Arnaud Droit,et al.  rTANDEM, an R/Bioconductor package for MS/MS protein identification , 2014, Bioinform..

[43]  Brendan MacLean,et al.  Bioinformatics Applications Note Gene Expression Skyline: an Open Source Document Editor for Creating and Analyzing Targeted Proteomics Experiments , 2022 .

[44]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[45]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[46]  Martin Eisenacher,et al.  The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results , 2012, Molecular & Cellular Proteomics.

[47]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[48]  John T. Prince,et al.  Metriculator: quality assessment for mass spectrometry-based proteomics , 2013, Bioinform..

[49]  Martin Eisenacher,et al.  Quality meets quantity – quality control, data standards and repositories , 2011, Proteomics.

[50]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[51]  Nichole L. King,et al.  Development and validation of a spectral library searching method for peptide identification from MS/MS , 2007, Proteomics.

[52]  Jeffrey S. Morris,et al.  The importance of experimental design in proteomic mass spectrometry experiments: some cautionary tales. , 2005, Briefings in functional genomics & proteomics.

[53]  David A Cairns,et al.  Statistical issues in quality control of proteomic analyses: Good experimental design and planning , 2011, Proteomics.

[54]  David L. Tabb,et al.  Performance Metrics for Liquid Chromatography-Tandem Mass Spectrometry Systems in Proteomics Analyses* , 2009, Molecular & Cellular Proteomics.

[55]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[56]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[57]  David Broadhurst,et al.  The importance of experimental design and QC samples in large-scale and MS-driven untargeted metabolomic studies of humans. , 2012, Bioanalysis.

[58]  Jeffrey S. Morris,et al.  Signal in noise: evaluating reported reproducibility of serum proteomic tests for ovarian cancer. , 2005, Journal of the National Cancer Institute.

[59]  Michael S. Bereman,et al.  Implementation of Statistical Process Control for Proteomic Experiments Via LC MS/MS , 2014, Journal of The American Society for Mass Spectrometry.