Designing biomedical proteomics experiments: state-of-the-art and future perspectives

ABSTRACT With the current expanded technical capabilities to perform mass spectrometry-based biomedical proteomics experiments, an improved focus on the design of experiments is crucial. As it is clear that ignoring the importance of a good design leads to an unprecedented rate of false discoveries which would poison our results, more and more tools are developed to help researchers designing proteomic experiments. In this review, we apply statistical thinking to go through the entire proteomics workflow for biomarker discovery and validation and relate the considerations that should be made at the level of hypothesis building, technology selection, experimental design and the optimization of the experimental parameters.

[1]  Bas van Breukelen,et al.  Current challenges in software solutions for mass spectrometry-based quantitative proteomics , 2012, Amino Acids.

[2]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[3]  Luc De Raedt,et al.  Active Learning for High Throughput Screening , 2008, Discovery Science.

[4]  Brett Larsen,et al.  A cost–benefit analysis of multidimensional fractionation of affinity purification‐mass spectrometry samples , 2011, Proteomics.

[5]  Richard D. Smith,et al.  Normalization and missing value imputation for label-free LC-MS analysis , 2012, BMC Bioinformatics.

[6]  Bernhard Küster,et al.  Software Tools for MS-Based Quantitative Proteomics: A Brief Overview , 2012, Quantitative Methods in Proteomics.

[7]  Robertson Craig,et al.  Open source system for analyzing, validating, and storing protein identification data. , 2004, Journal of proteome research.

[8]  Lennart Martens,et al.  iMonDB: Mass Spectrometry Quality Control through Instrument Monitoring. , 2015, Journal of proteome research.

[9]  Lennart Martens,et al.  Predicting tryptic cleavage from proteomics data using decision tree ensembles. , 2013, Journal of proteome research.

[10]  Lukas Käll,et al.  Training, selection, and robust calibration of retention time models for targeted proteomics. , 2010, Journal of proteome research.

[11]  Ying Zhang,et al.  Effect of dynamic exclusion duration on spectral count based quantitative proteomics. , 2009, Analytical chemistry.

[12]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[13]  B. Kuster,et al.  Proteomics: a pragmatic perspective , 2010, Nature Biotechnology.

[14]  Jeffrey S. Morris,et al.  Bias, Randomization, and Ovarian Proteomic Data: A Reply to “Producers and Consumers” , 2005 .

[15]  Oliver Kohlbacher,et al.  In silico design of targeted SRM-based experiments , 2012, BMC Bioinformatics.

[16]  M. Mann,et al.  Deep and Highly Sensitive Proteome Coverage by LC-MS/MS Without Prefractionation* , 2011, Molecular & Cellular Proteomics.

[17]  M. Mann,et al.  System-wide Perturbation Analysis with Nearly Complete Coverage of the Yeast Proteome by Single-shot Ultra HPLC Runs on a Bench Top Orbitrap* , 2011, Molecular & Cellular Proteomics.

[18]  Lennart Martens,et al.  Protein complex analysis: From raw protein lists to protein interaction networks. , 2017, Mass spectrometry reviews.

[19]  Damian Smedley,et al.  BioMart – biological queries made easy , 2009, BMC Genomics.

[20]  Nuno Bandeira,et al.  Expanding Proteome Coverage with Orthogonal-specificity α-Lytic Proteases* , 2014, Molecular & Cellular Proteomics.

[21]  S. Gygi,et al.  Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Farid E Ahmed,et al.  Liquid chromatography-mass spectrometry: a tool for proteome analysis and biomarker discovery and validation. , 2009, Expert opinion on medical diagnostics.

[23]  Lukas Käll,et al.  Solution to Statistical Challenges in Proteomics Is More Statistics, Not Less. , 2015, Journal of proteome research.

[24]  M. Mann,et al.  A practical recipe for stable isotope labeling by amino acids in cell culture (SILAC) , 2006, Nature Protocols.

[25]  Bo Zhang,et al.  Towards high peak capacity separations in normal pressure nanoflow liquid chromatography using meter long packed capillary columns. , 2014, Analytica chimica acta.

[26]  Yunping Zhu,et al.  The Prediction of Peptide Charge States for Electrospray Ionization in Mass Spectrometry , 2011 .

[27]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[28]  Jeffrey S. Morris,et al.  Statistical contributions to proteomic research. , 2010, Methods in molecular biology.

[29]  Bruno Domon,et al.  Targeted proteomics strategy applied to biomarker evaluation , 2013, Proteomics. Clinical applications.

[30]  L. Martens,et al.  Getting intimate with trypsin, the leading protease in proteomics. , 2013, Mass spectrometry reviews.

[31]  Olga Vitek,et al.  Statistical design of quantitative mass spectrometry-based proteomic experiments. , 2009, Journal of proteome research.

[32]  Joseph M. Foster,et al.  Chromatographic retention time prediction for posttranslationally modified peptides , 2012, Proteomics.

[33]  Robert W. Mee A Comprehensive Guide to Factorial Two-Level Experimentation , 2009 .

[34]  Trong Khoa Pham,et al.  Isobaric tags for relative and absolute quantitation (iTRAQ) reproducibility: Implication of multiple injections. , 2006, Journal of proteome research.

[35]  Manuela Truebano,et al.  The consequences of sample pooling in proteomics: An empirical study , 2009, Electrophoresis.

[36]  Lennart Martens,et al.  Integral quantification accuracy estimation for reporter ion-based quantitative proteomics (iQuARI). , 2012, Journal of proteome research.

[37]  David L Tabb,et al.  Quality assessment for clinical proteomics. , 2013, Clinical biochemistry.

[38]  J. I The Design of Experiments , 1936, Nature.

[39]  Min Zhan,et al.  A data review and re-assessment of ovarian cancer serum proteomic profiling , 2003, BMC Bioinformatics.

[40]  Andy J. Keane,et al.  Recent advances in surrogate-based optimization , 2009 .

[41]  Roman Kaliszan,et al.  Predictions of peptides' retention times in reversed‐phase liquid chromatography as a new supportive tool to improve protein identification in proteomics , 2009, Proteomics.

[42]  Jeffrey S. Morris,et al.  The importance of experimental design in proteomic mass spectrometry experiments: some cautionary tales. , 2005, Briefings in functional genomics & proteomics.

[43]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[44]  Jarrett D. Egertson,et al.  Multiplexed MS/MS for Improved Data Independent Acquisition , 2013, Nature Methods.

[45]  Dieter Deforce,et al.  iTRAQ as a method for optimization: Enhancing peptide recovery after gel fractionation , 2014, Proteomics.

[46]  Zhongqi Zhang,et al.  Prediction of low-energy collision-induced dissociation spectra of peptides with three or more charges. , 2005, Analytical chemistry.

[47]  M. Mann,et al.  Proteomic workflow for analysis of archival formalin‐fixed and paraffin‐embedded clinical samples to a depth of 10 000 proteins , 2013, Proteomics. Clinical applications.

[48]  Lennart Martens,et al.  Charting online OMICS resources: A navigational chart for clinical researchers , 2009, Proteomics. Clinical applications.

[49]  Charles Ansong,et al.  Optimization of proteomic sample preparation procedures for comprehensive protein characterization of pathogenic systems. , 2008, Journal of biomolecular techniques : JBT.

[50]  Dirk Valkenborg,et al.  An effective plasma membrane proteomics approach for small tissue samples , 2015, Scientific Reports.

[51]  Alexey I Nesvizhskii,et al.  Interpretation of Shotgun Proteomic Data , 2005, Molecular & Cellular Proteomics.

[52]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[53]  Arnold J. Stromberg,et al.  Statistical implications of pooling RNA samples for microarray experiments , 2003, BMC Bioinform..

[54]  R. Dean,et al.  Improving Proteome Coverage on a LTQ-Orbitrap Using Design of Experiments , 2011, Journal of the American Society for Mass Spectrometry.

[55]  Matthias Mann,et al.  Combination of FASP and StageTip-based fractionation allows in-depth analysis of the hippocampal membrane proteome. , 2009, Journal of proteome research.

[56]  R A Irizarry,et al.  On the utility of pooling biological samples in microarray experiments. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[57]  N. Karp,et al.  Design and Analysis Issues in Quantitative Proteomics Studies , 2007, Proteomics.

[58]  Ruedi Aebersold,et al.  Options and considerations when selecting a quantitative proteomics strategy , 2010, Nature Biotechnology.

[59]  Hendrik Blockeel,et al.  Multi-objective optimization with surrogate trees , 2013, GECCO '13.

[60]  Andrew H. Thompson,et al.  Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. , 2003, Analytical chemistry.

[61]  Joshua D. Knowles,et al.  ParEGO: a hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems , 2006, IEEE Transactions on Evolutionary Computation.

[62]  Ruedi Aebersold,et al.  Building and searching tandem mass (MS/MS) spectral libraries for peptide identification in proteomics. , 2011, Methods.

[63]  J. Vizcaíno,et al.  Exploring the potential of public proteomics data , 2015, Proteomics.

[64]  Benito Cañas,et al.  Trends in sample preparation for classical and second generation proteomics. , 2007, Journal of chromatography. A.

[65]  J. Keasling,et al.  A targeted proteomics toolkit for high-throughput absolute quantification of Escherichia coli proteins. , 2014, Metabolic engineering.

[66]  Henry H. N. Lam,et al.  PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows , 2008, EMBO reports.

[67]  Predrag Radivojac,et al.  The importance of peptide detectability for protein identification, quantification, and experiment design in MS/MS proteomics. , 2010, Journal of proteome research.

[68]  Florian Gnad,et al.  MAPU 2.0: high-accuracy proteomes mapped to genomes , 2009, Nucleic Acids Res..

[69]  Y. L. Ramachandra,et al.  Human Proteinpedia enables sharing of human protein data , 2008, Nature Biotechnology.

[70]  Andrew J. Thompson,et al.  Key issues in the acquisition and analysis of qualitative and quantitative mass spectrometry data for peptide-centric proteomic experiments , 2012, Amino Acids.

[71]  J. Coon,et al.  Value of using multiple proteases for large-scale mass spectrometry-based proteomics. , 2010, Journal of proteome research.

[72]  Piero P. Bonissone,et al.  Machine Learning Applications , 2015, Handbook of Computational Intelligence.

[73]  Kathryn S Lilley,et al.  Taming the isobaric tagging elephant in the room in quantitative proteomics , 2011, Nature Methods.

[74]  Lennart Martens,et al.  Machine learning applications in proteomics research: How the past can boost the future , 2014, Proteomics.

[75]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[76]  Lars Malmström,et al.  Identification of a Set of Conserved Eukaryotic Internal Retention Time Standards for Data-independent Acquisition Mass Spectrometry* , 2015, Molecular & Cellular Proteomics.

[77]  Piet Demeester,et al.  A Surrogate Modeling and Adaptive Sampling Toolbox for Computer Based Design , 2010, J. Mach. Learn. Res..

[78]  Kevin A. Schug,et al.  Chemometric study of the influence of instrumental parameters on ESI-MS analyte response using full factorial design , 2009 .

[79]  Kan Chen,et al.  Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates , 2013, Genom. Proteom. Bioinform..

[80]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[81]  Frank A. Witzmann,et al.  Issues and Applications in Label-Free Quantitative Mass Spectrometry , 2013, International journal of proteomics.

[82]  Lincoln Stein,et al.  Reactome knowledgebase of human biological pathways and processes , 2008, Nucleic Acids Res..

[83]  Lennart Martens,et al.  Bioinformatics for proteomics: opportunities at the interface between the scientists, their experiments, and the community. , 2014, Methods in molecular biology.

[84]  Lennart Martens,et al.  Proteomics data repositories: Providing a safe haven for your data and acting as a springboard for further research , 2010, Journal of proteomics.

[85]  J. Meek Prediction of peptide retention times in high-pressure liquid chromatography on the basis of amino acid composition. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[86]  P. Tessari,et al.  High Abundance Proteins Depletion vs Low Abundance Proteins Enrichment: Comparison of Methods to Reduce the Plasma Proteome Complexity , 2011, PloS one.

[87]  M. Mann,et al.  On the Proper Use of Mass Accuracy in Proteomics* , 2007, Molecular & Cellular Proteomics.

[88]  N Etxebarria,et al.  Development of a stir bar sorptive extraction and thermal desorption-gas chromatography-mass spectrometry method for the simultaneous determination of several persistent organic pollutants in water samples. , 2007, Journal of chromatography. A.

[89]  Gillian L. Currie,et al.  Risk of Bias in Reports of In Vivo Research: A Focus for Improvement , 2015, PLoS biology.

[90]  Zengyou He,et al.  Protein inference: A protein quantification perspective , 2016, Comput. Biol. Chem..

[91]  Bruno Domon,et al.  Selectivity of LC-MS/MS analysis: implication for proteomics experiments. , 2013, Journal of proteomics.

[92]  Johannes Griss,et al.  The Proteomics Identifications (PRIDE) database and associated tools: status in 2013 , 2012, Nucleic Acids Res..

[93]  Johannes P C Vissers,et al.  A novel Interface for variable flow nanoscale LC/MS/MS for improved proteome coverage , 2002, Journal of the American Society for Mass Spectrometry.

[94]  Roman A. Zubarev,et al.  Trypsin/Lys-C protease mix for enhanced protein mass spectrometry analysis , 2013, Nature Methods.

[95]  Bruno Domon,et al.  Advances in high-resolution quantitative proteomics: implications for clinical applications , 2015, Expert review of proteomics.

[96]  Daniel B. Martin,et al.  Collision energy optimization of b- and y-ions for multiple reaction monitoring mass spectrometry. , 2011, Journal of proteome research.

[97]  Sabine Schulze Screening Methods For Experimentation In Industry Drug Discovery And Genetics , 2016 .

[98]  Ulisses Braga-Neto,et al.  A systematic model of the LC-MS proteomics pipeline , 2012, BMC Genomics.

[99]  Lennart Martens,et al.  Crowdsourcing in proteomics: public resources lead to better experiments , 2013, Amino Acids.

[100]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[101]  Lennart Martens,et al.  Peptide and protein quantification: A map of the minefield , 2010, Proteomics.

[102]  W. Ens,et al.  Sequence-specific retention calculator. A family of peptide retention time prediction algorithms in reversed-phase HPLC: applicability to various chromatographic conditions and columns. , 2007, Analytical chemistry.

[103]  Lennart Martens,et al.  Analyzing large-scale proteomics projects with latent semantic indexing. , 2008, Journal of proteome research.

[104]  Robert J Beynon,et al.  Absolute Multiplexed Protein Quantification Using QconCAT Technology , 2012, Quantitative Methods in Proteomics.

[105]  J. Yates,et al.  Protein analysis by shotgun/bottom-up proteomics. , 2013, Chemical reviews.

[106]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[107]  Lennart Martens,et al.  MS2PIP prediction server: compute and visualize MS2 peak intensity predictions for CID and HCD fragmentation , 2015, Nucleic Acids Res..

[108]  Yasset Perez-Riverol,et al.  Open source libraries and frameworks for mass spectrometry based proteomics: A developer's perspective , 2014, Biochimica et biophysica acta.

[109]  Ben C. Collins,et al.  OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data , 2014, Nature Biotechnology.

[110]  Lars Malmström,et al.  The Yeast Resource Center Public Data Repository , 2004, Nucleic Acids Res..

[111]  William Stafford Noble,et al.  Peptide Retention Time Prediction Yields Improved Tandem Mass Spectrum Identification for Diverse Chromatography Conditions , 2007, RECOMB.

[112]  R. Aebersold,et al.  Selected reaction monitoring–based proteomics: workflows, potential, pitfalls and future directions , 2012, Nature Methods.

[113]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[114]  Predrag Radivojac,et al.  A Machine Learning Approach to Predicting Peptide Fragmentation Spectra , 2005, Pacific Symposium on Biocomputing.

[115]  Kathryn S Lilley,et al.  Impact of replicate types on proteomic expression analysis. , 2005, Journal of proteome research.

[116]  Jeffrey S. Morris,et al.  Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments , 2004, Bioinform..

[117]  Susan E Abbatiello,et al.  Effect of collision energy optimization on the measurement of peptides by selected reaction monitoring (SRM) mass spectrometry. , 2010, Analytical chemistry.

[118]  Luis Mendoza,et al.  PASSEL: The PeptideAtlas SRMexperiment library , 2012, Proteomics.

[119]  Ekaterina Mostovenko,et al.  Comparison of peptide and protein fractionation methods in proteomics , 2013 .

[120]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.

[121]  C. Mant,et al.  Context-dependent effects on the hydrophilicity/hydrophobicity of side-chains during reversed-phase high-performance liquid chromatography: Implications for prediction of peptide retention behaviour. , 2006, Journal of chromatography. A.