Simultaneous Improvement in the Precision, Accuracy, and Robustness of Label-free Proteome Quantification by Optimizing Data Manipulation Chains*

High-quality label-free proteome quantification (LFQ) is valuable for clinical and pharmaceutical studies yet remains extremely challenging despite technical advances. Particularly, fluctuating precision, limited robustness, and compromised accuracy are known issues. Here, we described and validated a new strategy enabling the discovery of the LFQs of simultaneously enhanced precision, robustness, and accuracy from thousands of LFQ manipulation chains. In the proof-of-concept study, this strategy showed superior ability in identifying well-performing LFQs. An online tool incorporating this novel strategy was also developed. Graphical Abstract Highlights High-quality LFQ is valuable technique yet remains extremely challenging. Fluctuating precision, limited robustness, and compromised accuracy are known issues. We proposed a strategy collectively improving LFQ precision, robustness, and accuracy. An online tool incorporating this novel strategy was also developed. The label-free proteome quantification (LFQ) is multistep workflow collectively defined by quantification tools and subsequent data manipulation methods that has been extensively applied in current biomedical, agricultural, and environmental studies. Despite recent advances, in-depth and high-quality quantification remains extremely challenging and requires the optimization of LFQs by comparatively evaluating their performance. However, the evaluation results using different criteria (precision, accuracy, and robustness) vary greatly, and the huge number of potential LFQs becomes one of the bottlenecks in comprehensively optimizing proteome quantification. In this study, a novel strategy, enabling the discovery of the LFQs of simultaneously enhanced performance from thousands of workflows (integrating 18 quantification tools with 3,128 manipulation chains), was therefore proposed. First, the feasibility of achieving simultaneous improvement in the precision, accuracy, and robustness of LFQ was systematically assessed by collectively optimizing its multistep manipulation chains. Second, based on a variety of benchmark datasets acquired by various quantification measurements of different modes of acquisition, this novel strategy successfully identified a number of manipulation chains that simultaneously improved the performance across multiple criteria. Finally, to further enhance proteome quantification and discover the LFQs of optimal performance, an online tool (https://idrblab.org/anpela/) enabling collective performance assessment (from multiple perspectives) of the entire LFQ workflow was developed. This study confirmed the feasibility of achieving simultaneous improvement in precision, accuracy, and robustness. The novel strategy proposed and validated in this study together with the online tool might provide useful guidance for the research field requiring the mass-spectrometry-based LFQ technique.

[1]  Yohann Couté,et al.  Benchmarking quantitative label-free LC-MS data processing workflows using a complex spiked proteomic standard dataset. , 2016, Journal of proteomics.

[2]  Joachim Selbig,et al.  pcaMethods - a bioconductor package providing PCA methods for incomplete data , 2007, Bioinform..

[3]  Yasset Perez-Riverol,et al.  A multi-center study benchmarks software tools for label-free proteome quantification , 2016, Nature Biotechnology.

[4]  L.L. Elo,et al.  Reproducibility-Optimized Test Statistic for Ranking Genes in Microarray Studies , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Bo Li,et al.  NOREVA: normalization and evaluation of MS-based metabolomics data , 2017, Nucleic Acids Res..

[6]  Joel G Pounds,et al.  A statistical selection strategy for normalization procedures in LC‐MS proteomics experiments through dataset‐dependent ranking of normalization scaling factors , 2011, Proteomics.

[7]  Qiang Hu,et al.  IonStar enables high-precision, low-missing-data proteomics quantification in large biological cohorts , 2018, Proceedings of the National Academy of Sciences.

[8]  K. Kultima,et al.  Analysis of the Cerebrospinal Fluid Proteome in Alzheimer's Disease , 2016, PloS one.

[9]  Michael A. Freitas,et al.  A multi-model statistical approach for proteomic spectral count quantitation. , 2016, Journal of proteomics.

[10]  Hui Sun,et al.  Urine Metabolomics Analysis for Biomarker Discovery and Detection of Jaundice Syndrome in Patients With Liver Disease* , 2012, Molecular & Cellular Proteomics.

[11]  Joshua N. Adkins,et al.  Normalization of peak intensities in bottom-up MS-based proteomics using singular value decomposition , 2009, Bioinform..

[12]  Jana Seifert,et al.  Dietary changes in nutritional studies shape the structural and functional composition of the pigs’ fecal microbiome—from days to weeks , 2017, Microbiome.

[13]  Ruedi Aebersold,et al.  Quantitative variability of 342 plasma proteins in a human twin population , 2015 .

[14]  Alessandro Pavan,et al.  Two-dimensional gel proteome reference map of human small intestine , 2009, Proteome Science.

[15]  Stefan Tenzer,et al.  Label-free quantification in ion mobility–enhanced data-independent acquisition proteomics , 2016, Nature Protocols.

[16]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[17]  Dana Pascovici,et al.  SWATH Mass Spectrometry Performance Using Extended Peptide MS/MS Assay Libraries* , 2016, Molecular & Cellular Proteomics.

[18]  Rainer Breitling,et al.  msCompare: A Framework for Quantitative Analysis of Label-free LC-MS Data for Comparative Candidate Biomarker Studies* , 2012, Molecular & Cellular Proteomics.

[19]  Trong Khoa Pham,et al.  Isobaric tags for relative and absolute quantitation (iTRAQ) reproducibility: Implication of multiple injections. , 2006, Journal of proteome research.

[20]  B. Blaise,et al.  Data-driven sample size determination for metabolic phenotyping studies. , 2013, Analytical chemistry.

[21]  Claus-Dieter Mayer,et al.  An evaluation of two-channel ChIP-on-chip and DNA methylation microarray normalization strategies , 2012, BMC Genomics.

[22]  T. Speed,et al.  Normalizing and integrating metabolomics data. , 2012, Analytical chemistry.

[23]  Yohann Couté,et al.  Spiked proteomic standard dataset for testing label-free quantitative software and statistical methods , 2015, Data in brief.

[24]  Raphael Gottardo,et al.  Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution , 2010, Statistics and Computing.

[25]  Richard D. Smith,et al.  Normalization and missing value imputation for label-free LC-MS analysis , 2012, BMC Bioinformatics.

[26]  Jesse R. Zaneveld,et al.  Normalization and microbial differential abundance strategies depend upon data characteristics , 2017, Microbiome.

[27]  A Smolinska,et al.  Current breathomics—a review on data pre-processing techniques and machine learning in metabolomics breath analysis , 2014, Journal of breath research.

[28]  Chih-Chiang Tsou,et al.  DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics , 2015, Nature Methods.

[29]  Nevan J. Krogan,et al.  An Approach to Spatiotemporally Resolve Protein Interaction Networks in Living Cells , 2017, Cell.

[30]  Rachel M. Adams,et al.  Systematic comparison of label-free, metabolic labeling, and isobaric chemical labeling for quantitative proteomics on LTQ Orbitrap Velos. , 2012, Journal of proteome research.

[31]  Fredrik Levander,et al.  Normalyzer: A Tool for Rapid Evaluation of Normalization Methods for Omics Data Sets , 2014, Journal of proteome research.

[32]  J. Koziol,et al.  Label-free, normalized quantification of complex mass spectrometry data for proteomics analysis , 2009, Nature Biotechnology.

[33]  Stephen J. Callister,et al.  Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. , 2006, Journal of proteome research.

[34]  Laura L. Elo,et al.  A systematic evaluation of normalization methods in quantitative label-free proteomics , 2016, Briefings Bioinform..

[35]  Ludovic C. Gillet,et al.  Rapid mass spectrometric conversion of tissue biopsy samples into permanent quantitative digital proteome maps , 2015, Nature Medicine.

[36]  Ben C. Collins,et al.  OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data , 2014, Nature Biotechnology.

[37]  Laura L. Elo,et al.  A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation , 2017, Briefings Bioinform..

[38]  Dieter Deforce,et al.  Extracting histones for the specific purpose of label‐free MS , 2016, Proteomics.

[39]  Henning Hermjakob,et al.  Testing and Validation of Computational Methods for Mass Spectrometry. , 2016, Journal of proteome research.

[40]  C. Kuo,et al.  Batch Normalizer: a fast total abundance regression calibration method to simultaneously adjust batch and injection order effects in liquid chromatography/time-of-flight mass spectrometry-based metabolomics data and comparison with current calibration methods. , 2013, Analytical chemistry.

[41]  Lloyd Paul Aiello,et al.  Proteomic Analysis of Embryonic and Young Human Vitreous. , 2015, Investigative ophthalmology & visual science.

[42]  Bin Wang,et al.  Normalizing bead-based microRNA expression data: a measurement error model-based approach , 2011, Bioinform..

[43]  Hannes Röst,et al.  DIAlignR Provides Precise Retention Time Alignment Across Distant Runs in DIA and Targeted Proteomics* , 2019, Molecular & Cellular Proteomics.

[44]  Jean Armengaud,et al.  Proteome data to explore the impact of pBClin15 on Bacillus cereus ATCC 14579 , 2016, Data in brief.

[45]  Juan Zhou,et al.  Mapping in vivo target interaction profiles of covalent inhibitors using chemical proteomics with label-free quantification , 2018, Nature Protocols.

[46]  Martin Eisenacher,et al.  The PRIDE database and related tools and resources in 2019: improving support for quantification data , 2018, Nucleic Acids Res..

[47]  Wolfgang Hoehenwarter,et al.  Assessment of Label-Free Quantification in Discovery Proteomics and Impact of Technological Factors and Natural Variability of Protein Abundance. , 2017, Journal of proteome research.

[48]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[49]  Lukas Käll,et al.  DeMix-Q: Quantification-Centered Data Processing Workflow* , 2016, Molecular & Cellular Proteomics.

[50]  Albert Ludolph,et al.  Proteomic studies in the discovery of cerebrospinal fluid biomarkers for amyotrophic lateral sclerosis , 2017, Expert review of proteomics.

[51]  J. Lindon,et al.  Scaling and normalization effects in NMR spectroscopic metabonomic data sets. , 2006, Analytical chemistry.

[52]  Murray J Cairns,et al.  Optimal consistency in microRNA expression analysis using reference-gene-based normalization. , 2015, Molecular bioSystems.

[53]  Peter Filzmoser,et al.  Untargeted analysis of chromatographic data for green and fermented rooibos: Problem with size effect removal. , 2017, Journal of chromatography. A.

[54]  Pei Wang,et al.  Analyzing LC-MS/MS data by spectral count and ion abundance: two case studies. , 2011, Statistics and its interface.

[55]  Tomi Suomi,et al.  Optimization of Statistical Methods Impact on Quantitative Proteomics Data. , 2015, Journal of proteome research.

[56]  Simone Sidoli,et al.  Graphical Interpretation and Analysis of Proteins and their Ontologies (GiaPronto): A One-Click Graph Visualization Software for Proteomics Data Sets * , 2017, Molecular & Cellular Proteomics.

[57]  Juri Rappsilber,et al.  On the Reproducibility of Label-Free Quantitative Cross-Linking/Mass Spectrometry , 2017, Journal of The American Society for Mass Spectrometry.

[58]  Terry M. Therneau,et al.  Faster cyclic loess: normalizing RNA arrays via linear models , 2004, Bioinform..

[59]  Stefan Tenzer,et al.  In‐depth evaluation of software tools for data‐independent acquisition based label‐free quantification , 2015, Proteomics.

[60]  A. Vlahou,et al.  Developing proteomic biomarkers for bladder cancer: towards clinical application , 2015, Nature Reviews Urology.

[61]  Birgit Schilling,et al.  Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. , 2010, Journal of proteome research.

[62]  Peer Bork,et al.  Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees , 2016, Nucleic Acids Res..

[63]  Marco Y. Hein,et al.  Accurate Proteome-wide Label-free Quantification by Delayed Normalization and Maximal Peptide Ratio Extraction, Termed MaxLFQ * , 2014, Molecular & Cellular Proteomics.

[64]  Jüergen Cox,et al.  The MaxQuant computational platform for mass spectrometry-based shotgun proteomics , 2016, Nature Protocols.

[65]  F. Bäckhed,et al.  Bifidobacteria or Fiber Protects against Diet-Induced Microbiota-Mediated Colonic Mucus Deterioration. , 2018, Cell host & microbe.

[66]  Setsuko Komatsu,et al.  Label-free quantitative proteomic analysis of abscisic acid effect in early-stage soybean under flooding. , 2013, Journal of proteome research.

[67]  Ben C. Collins,et al.  Precise Temporal Profiling of Signaling Complexes in Primary Cells Using SWATH Mass Spectrometry , 2017, Cell reports.

[68]  Jasper Engel,et al.  Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling , 2016, Metabolomics.

[69]  R. A. van den Berg,et al.  Centering, scaling, and transformations: improving the biological information content of metabolomics data , 2006, BMC Genomics.

[70]  Andreas Ziegler,et al.  Comparison of pre-processing methods for multiplex bead-based immunoassays , 2016, BMC Genomics.

[71]  Hendrik Weisser,et al.  Targeted Feature Detection for Data-Dependent Shotgun Proteomics , 2017, Journal of proteome research.

[72]  M R Barer,et al.  Bacterial viability and culturability. , 1999, Advances in microbial physiology.

[73]  Rajiv Gandhi,et al.  Identification of psoriatic arthritis mediators in synovial fluid by quantitative mass spectrometry , 2014, Clinical Proteomics.

[74]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[75]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.

[76]  Jong Min Ko,et al.  In-depth proteomic analysis of Glycine max seeds during controlled deterioration treatment reveals a shift in seed metabolism. , 2017, Journal of proteomics.

[77]  R. Zimmer,et al.  Normalization strategies for mRNA expression data in cartilage research. , 2008, Osteoarthritis and Cartilage.

[78]  Peer Bork,et al.  Quantifying compartment‐associated variations of protein abundance in proteomics data , 2018, Molecular systems biology.

[79]  Michael L Gross,et al.  Mass Spectrometry-Based Fast Photochemical Oxidation of Proteins (FPOP) for Higher Order Structure Characterization. , 2018, Accounts of chemical research.

[80]  W. Huber,et al.  Model-based variance-stabilizing transformation for Illumina microarray data , 2008, Nucleic acids research.

[81]  Susan J Fisher,et al.  Quantitative proteomic analyses of mammary organoids reveals distinct signatures after exposure to environmental chemicals , 2016, Proceedings of the National Academy of Sciences.

[82]  Yaoyang Zhang,et al.  SWATH enables precise label‐free quantification on proteome scale , 2015, Proteomics.

[83]  A. Smilde,et al.  Fusion of mass spectrometry-based metabolomics data. , 2005, Analytical chemistry.

[84]  S. Shen-Orr,et al.  Social network architecture of human immune cells unveiled by quantitative proteomics , 2017, Nature Immunology.

[85]  Jesper V Olsen,et al.  Benchmarking common quantification strategies for large-scale phosphoproteomics , 2018, Nature Communications.

[86]  John J. Miles,et al.  Mining, visualizing and comparing multidimensional biomolecular data using the Genomics Data Miner (GMine) Web-Server , 2016, Scientific Reports.

[87]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[88]  James Butcher,et al.  Altered intestinal microbiota–host mitochondria crosstalk in new onset Crohn's disease , 2016, Nature Communications.

[89]  Marco Y. Hein,et al.  The Perseus computational platform for comprehensive analysis of (prote)omics data , 2016, Nature Methods.

[90]  N. Karp,et al.  Addressing Accuracy and Precision Issues in iTRAQ Quantitation* , 2010, Molecular & Cellular Proteomics.