Computational and Mass-Spectrometry-Based Workflow for the Discovery and Validation of Missing Human Proteins: Application to Chromosomes 2 and 14.

In the framework of the C-HPP, our Franco-Swiss consortium has adopted chromosomes 2 and 14, coding for a total of 382 missing proteins (proteins for which evidence is lacking at protein level). Over the last 4 years, the French proteomics infrastructure has collected high-quality data sets from 40 human samples, including a series of rarely studied cell lines, tissue types, and sample preparations. Here we described a step-by-step strategy based on the use of bioinformatics screening and subsequent mass spectrometry (MS)-based validation to identify what were up to now missing proteins in these data sets. Screening database search results (85,326 dat files) identified 58 of the missing proteins (36 on chromosome 2 and 22 on chromosome 14) by 83 unique peptides following the latest release of neXtProt (2014-09-19). PSMs corresponding to these peptides were thoroughly examined by applying two different MS-based criteria: peptide-level false discovery rate calculation and expert PSM quality assessment. Synthetic peptides were then produced and used to generate reference MS/MS spectra. A spectral similarity score was then calculated for each pair of reference-endogenous spectra and used as a third criterion for missing protein validation. Finally, LC-SRM assays were developed to target proteotypic peptides from four of the missing proteins detected in tissue/cell samples, which were still available and for which sample preparation could be reproduced. These LC-SRM assays unambiguously detected the endogenous unique peptide for three of the proteins. For two of these, identification was confirmed by additional proteotypic peptides. We concluded that of the initial set of 58 proteins detected by the bioinformatics screen, the consecutive MS-based validation criteria led to propose the identification of 13 of these proteins (8 on chromosome 2 and 5 on chromosome 14) that passed at least two of the three MS-based criteria. Thus, a rigorous step-by-step approach combining bioinformatics screening and MS-based validation assays is particularly suitable to obtain protein-level evidence for proteins previously considered as missing. All MS/MS data have been deposited in ProteomeXchange under identifier PXD002131.

[1]  Koji Yoda,et al.  Svp26 Facilitates Endoplasmic Reticulum to Golgi Transport of a Set of Mannosyltransferases in Saccharomyces cerevisiae* , 2010, The Journal of Biological Chemistry.

[2]  J. Yates,et al.  The application of mass spectrometry to membrane proteomics , 2003, Nature Biotechnology.

[3]  Paul M Thomas,et al.  Top down proteomics of human membrane proteins from enriched mitochondrial fractions. , 2013, Analytical chemistry.

[4]  Sergio Tofanelli,et al.  Molecular and functional evolution of human DHRS2 and DHRS4 duplicated genes. , 2012, Gene.

[5]  Andrew R. Jones,et al.  ProteomeXchange provides globally co-ordinated proteomics data submission and dissemination , 2014, Nature Biotechnology.

[6]  Melinda Rezeli,et al.  Quantification of human kallikrein-2 in clinical samples by selected reaction monitoring. , 2013, Journal of proteome research.

[7]  S. Hanash,et al.  Standard guidelines for the chromosome-centric human proteome project. , 2012, Journal of proteome research.

[8]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[9]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[10]  Jesper Nylandsted,et al.  Depletion of Kinesin 5B Affects Lysosomal Distribution and Stability and Induces Peri-Nuclear Accumulation of Autophagosomes in Cancer Cells , 2009, PloS one.

[11]  Salvador Capella-Gutiérrez,et al.  PhylomeDB v3.0: an expanding repository of genome-wide collections of trees, alignments and phylogeny-based orthology and paralogy predictions , 2010, Nucleic Acids Res..

[12]  Susan E. Abbatiello,et al.  Targeted Peptide Measurements in Biology and Medicine: Best Practices for Mass Spectrometry-based Assay Development Using a Fit-for-Purpose Approach* , 2014, Molecular & Cellular Proteomics.

[13]  Ruixiang Sun,et al.  Open MS/MS spectral library search to identify unanticipated post-translational modifications and increase spectral identification rate , 2010, Bioinform..

[14]  Joshua E. Elias,et al.  Target-Decoy Search Strategy for Mass Spectrometry-Based Proteomics , 2010, Proteome Bioinformatics.

[15]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.

[16]  Brendan MacLean,et al.  Bioinformatics Applications Note Gene Expression Skyline: an Open Source Document Editor for Creating and Analyzing Targeted Proteomics Experiments , 2022 .

[17]  Eric W Deutsch,et al.  The state of the human proteome in 2012 as viewed through PeptideAtlas. , 2013, Journal of proteome research.

[18]  Charles Barlowe,et al.  Erv26p directs pro-alkaline phosphatase into endoplasmic reticulum-derived coat protein complex II transport vesicles. , 2006, Molecular biology of the cell.

[19]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[20]  Cecilia M. Lindgren,et al.  Identification of MAMDC 1 as a candidate susceptibility gene for systemic lupus erythematosus ( SLE ) , 2022 .

[21]  Nikolaus Rajewsky,et al.  Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation , 2014, The EMBO journal.

[22]  Amos Bairoch,et al.  Metrics for the Human Proteome Project 2013-2014 and strategies for finding missing proteins. , 2014, Journal of proteome research.

[23]  Gary D Bader,et al.  The biology/disease-driven human proteome project (B/D-HPP): enabling protein research for the life sciences community. , 2013, Journal of proteome research.

[24]  Christoph H Borchers,et al.  Design, Implementation and Multisite Evaluation of a System Suitability Protocol for the Quantitative Assessment of Instrument Performance in Liquid Chromatography-Multiple Reaction Monitoring-MS (LC-MRM-MS)* , 2013, Molecular & Cellular Proteomics.

[25]  J. Buhmann,et al.  Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry* , 2009, Molecular & Cellular Proteomics.

[26]  Ann Marie Craig,et al.  Interaction between autism-linked MDGAs and neuroligins suppresses inhibitory synapse development , 2013, The Journal of cell biology.

[27]  M. Tress,et al.  Analyzing the First Drafts of the Human Proteome , 2014, Journal of proteome research.

[28]  Dongmin Lee,et al.  MDGAs interact selectively with neuroligin-2 but not other neuroligins to regulate inhibitory synapse development , 2012, Proceedings of the National Academy of Sciences.

[29]  G. Omenn,et al.  A first step toward completion of a genome-wide characterization of the human proteome. , 2013, Journal of proteome research.

[30]  Chen Chen,et al.  Screening of missing proteins in the human liver proteome by improved MRM-approach-based targeted proteomics. , 2014, Journal of proteome research.

[31]  Charles Barlowe,et al.  Molecular Dissection of Erv26p Identifies Separable Cargo Binding and Coat Protein Sorting Activities* , 2009, The Journal of Biological Chemistry.

[32]  R. Aebersold,et al.  Selected reaction monitoring–based proteomics: workflows, potential, pitfalls and future directions , 2012, Nature Methods.

[33]  Cecilia M. Lindgren,et al.  Identification of MAMDC1 as a Candidate Susceptibility Gene for Systemic Lupus Erythematosus (SLE) , 2009, PloS one.

[34]  Susan E Abbatiello,et al.  Effect of collision energy optimization on the measurement of peptides by selected reaction monitoring (SRM) mass spectrometry. , 2010, Analytical chemistry.

[35]  Xuhong Song,et al.  AS1DHRS4, a head-to-head natural antisense transcript, silences the DHRS4 gene cluster in cis and trans , 2012, Proceedings of the National Academy of Sciences.

[36]  Amos Bairoch,et al.  neXtProt: a knowledge platform for human proteins , 2011, Nucleic Acids Res..