Review of Issues and Solutions to Data Analysis Reproducibility and Data Quality in Clinical Proteomics.

In any analytical discipline, data analysis reproducibility is closely interlinked with data quality. In this book chapter focused on mass spectrometry-based proteomics approaches, we introduce how both data analysis reproducibility and data quality can influence each other and how data quality and data analysis designs can be used to increase robustness and improve reproducibility. We first introduce methods and concepts to design and maintain robust data analysis pipelines such that reproducibility can be increased in parallel. The technical aspects related to data analysis reproducibility are challenging, and current ways to increase the overall robustness are multifaceted. Software containerization and cloud infrastructures play an important part.We will also show how quality control (QC) and quality assessment (QA) approaches can be used to spot analytical issues, reduce the experimental variability, and increase confidence in the analytical results of (clinical) proteomics studies, since experimental variability plays a substantial role in analysis reproducibility. Therefore, we give an overview on existing solutions for QC/QA, including different quality metrics, and methods for longitudinal monitoring. The efficient use of both types of approaches undoubtedly provides a way to improve the experimental reliability, reproducibility, and level of consistency in proteomics analytical measurements.

[1]  Richard O. Sinnott,et al.  Investigating reproducibility and tracking provenance – A genomic workflow case study , 2017, BMC Bioinformatics.

[2]  L. F. Abbott,et al.  full-FORCE: A target-based method for training recurrent networks , 2017, PloS one.

[3]  David Bramwell An introduction to statistical process control in research proteomics. , 2013, Journal of proteomics.

[4]  Silvio C. E. Tosatto,et al.  Tools and data services registry: a community effort to document bioinformatics resources , 2015, Nucleic Acids Res..

[5]  Harald Barsnes,et al.  SearchGUI: A Highly Adaptable Common Interface for Proteomics Search and de Novo Engines. , 2018, Journal of proteome research.

[6]  Karl Mechtler,et al.  Quality control in LC‐MS/MS , 2011, Proteomics.

[7]  A. Bhardwaj,et al.  In situ click chemistry generation of cyclooxygenase-2 inhibitors , 2017, Nature Communications.

[8]  Lennart Martens,et al.  Quality Control in Proteomics , 2011, Proteomics.

[9]  R Vanholder,et al.  Chitinase-like Proteins are Candidate Biomarkers for Sepsis-induced Acute Kidney Injury* , 2012, Molecular & Cellular Proteomics.

[10]  John Ioannidis,et al.  What is wrong with clinical proteomics? , 2014, Clinical chemistry.

[11]  John D. Venable,et al.  ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity. , 2015, Journal of proteomics.

[12]  J O Westgard,et al.  A multi-rule Shewhart chart for quality control in clinical chemistry. , 1981, Clinical chemistry.

[13]  Juan-Pablo Albar,et al.  Standardization and quality control in proteomics. , 2013, Journal of proteomics.

[14]  J. Eng,et al.  Comet: An open‐source MS/MS sequence database search tool , 2013, Proteomics.

[15]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[16]  Pavel A. Pevzner,et al.  Universal database search tool for proteomics , 2014, Nature Communications.

[17]  Ben C. Collins,et al.  OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data , 2014, Nature Biotechnology.

[18]  Michael R. Crusoe,et al.  Common Workflow Language , 2015 .

[19]  Michael S Bereman,et al.  MSstatsQC: Longitudinal System Suitability Monitoring and Quality Control for Targeted Proteomic Experiments* , 2017, Molecular & Cellular Proteomics.

[20]  Timothy M Lenton,et al.  Elevated CO2 degassing rates prevented the return of Snowball Earth during the Phanerozoic , 2017, Nature Communications.

[21]  David L. Tabb,et al.  Performance Metrics for Liquid Chromatography-Tandem Mass Spectrometry Systems in Proteomics Analyses* , 2009, Molecular & Cellular Proteomics.

[22]  Helmut E Meyer,et al.  Approaching clinical proteomics: current state and future fields of application in fluid proteomics , 2009, Clinical chemistry and laboratory medicine.

[23]  B. Searle Scaffold: A bioinformatic tool for validating MS/MS‐based proteomic studies , 2010, Proteomics.

[24]  Michael S. Bereman,et al.  Implementation of Statistical Process Control for Proteomic Experiments Via LC MS/MS , 2014, Journal of The American Society for Mass Spectrometry.

[25]  Lennart Martens,et al.  qcML: An Exchange Format for Quality Control Metrics from Mass Spectrometry Experiments , 2014, Molecular & Cellular Proteomics.

[26]  Martin Eisenacher,et al.  The mzQuantML Data Standard for Mass Spectrometry–based Quantitative Studies in Proteomics , 2013, Molecular & Cellular Proteomics.

[27]  W. A. Shewhart,et al.  Statistical method from the viewpoint of quality control , 1939 .

[28]  Frank Klont,et al.  Assessment of Sample Preparation Bias in Mass Spectrometry-Based Proteomics , 2018, Analytical chemistry.

[29]  Douglas Thompson,et al.  Integrated multi-level quality control for proteomic profiling studies using mass spectrometry , 2008, BMC Bioinformatics.

[30]  R. Beavis,et al.  A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. , 2003, Analytical chemistry.

[31]  Florentino Fernández Riverola,et al.  Mass-Up: an all-in-one open software application for MALDI-TOF mass spectrometry knowledge discovery , 2015, BMC Bioinformatics.

[32]  K. Reinert,et al.  OpenMS: a flexible open-source software platform for mass spectrometry data analysis , 2016, Nature Methods.

[33]  Birgit Schilling,et al.  Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. , 2010, Journal of proteome research.

[34]  Chris Bielow,et al.  Proteomics Quality Control: Quality Control Software for MaxQuant Results. , 2016, Journal of proteome research.

[35]  Rolf Apweiler,et al.  The Proteomics Standards Initiative , 2003, Proteomics.

[36]  Ronald J. Moore,et al.  Sources of technical variability in quantitative LC-MS proteomics: human brain tissue sample analysis. , 2013, Journal of proteome research.

[37]  Bin Ma,et al.  PEAKS DB: De Novo Sequencing Assisted Database Search for Sensitive and Accurate Peptide Identification* , 2011, Molecular & Cellular Proteomics.

[38]  Martin Eisenacher,et al.  Proteomics Standards Initiative: Fifteen Years of Progress and Future Work , 2017, Journal of proteome research.

[39]  Ruedi Aebersold,et al.  Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs , 2012, BMC Bioinformatics.

[40]  Robert Burke,et al.  ProteoWizard: open source software for rapid proteomics tools development , 2008, Bioinform..

[41]  Eystein Oveland,et al.  PeptideShaker enables reanalysis of MS-derived proteomics data sets , 2015, Nature Biotechnology.

[42]  Edward M. Marcotte,et al.  mspire: mass spectrometry proteomics in Ruby , 2008, Bioinform..

[43]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.

[44]  Martin Eisenacher,et al.  Development of data representation standards by the human proteome organization proteomics standards initiative , 2015, J. Am. Medical Informatics Assoc..

[45]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.

[46]  Brett Larsen,et al.  Multi-laboratory assessment of reproducibility, qualitative and quantitative performance of SWATH-mass spectrometry , 2016, bioRxiv.

[47]  Lorenzo J. Vega-Montoto,et al.  QuaMeter: multivendor performance metrics for LC-MS/MS proteomics instrumentation. , 2012, Analytical chemistry.

[48]  Henry Rodriguez,et al.  Revolutionizing Precision Oncology through Collaborative Proteogenomics and Data Sharing , 2018, Cell.

[49]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[50]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[51]  Lennart Martens,et al.  A posteriori quality control for the curation and reuse of public proteomics data , 2011, Proteomics.

[52]  Guadalupe Espadas,et al.  QCloud: A cloud-based quality control system for mass spectrometry-based proteomics laboratories , 2018, PloS one.

[53]  J. Villanueva,et al.  Isotope dilution mass spectrometry for absolute quantification in proteomics: concepts and strategies. , 2014, Journal of proteomics.

[54]  Ola Spjuth,et al.  Galaxy-Kubernetes integration: scaling bioinformatics workflows in the cloud , 2018, bioRxiv.

[55]  Ola Spjuth,et al.  PhenoMeNal: Processing and analysis of Metabolomics data in the Cloud , 2018 .

[56]  David L Tabb,et al.  Quality assessment for clinical proteomics. , 2013, Clinical biochemistry.

[57]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[58]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[59]  Thorsten Meinl,et al.  KNIME - the Konstanz information miner: version 2.0 and beyond , 2009, SKDD.

[60]  Lennart Martens,et al.  iMonDB: Mass Spectrometry Quality Control through Instrument Monitoring. , 2015, Journal of proteome research.

[61]  Lennart Martens,et al.  Computational quality control tools for mass spectrometry proteomics , 2017, Proteomics.

[62]  Hunter N. B. Moseley,et al.  Proceedings of the 16th Annual UT-KBRIN Bioinformatics Summit 2016: bioinformatics , 2017, BMC Bioinformatics.

[63]  Harald Barsnes,et al.  BioContainers: an open-source and community-driven framework for software standardization , 2017, Bioinform..

[64]  David L. Tabb,et al.  QC Metrics from CPTAC Raw LC-MS/MS Data Interpreted through Multivariate Statistics , 2014, Analytical chemistry.

[65]  Jun Fan,et al.  The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider Audience* , 2014, Molecular & Cellular Proteomics.

[66]  Martin Eisenacher,et al.  The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results , 2012, Molecular & Cellular Proteomics.

[67]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[68]  Karl Mechtler,et al.  SIMPATIQCO: A Server-Based Software Suite Which Facilitates Monitoring the Time Course of LC–MS Performance Metrics on Orbitrap Instruments , 2012, Journal of proteome research.

[69]  Harald Barsnes,et al.  The mzIdentML Data Standard Version 1.2, Supporting Advances in Proteome Informatics* , 2017, Molecular & Cellular Proteomics.