Metrics for the Human Proteome Project 2016: Progress on Identifying and Characterizing the Human Proteome, Including Post-Translational Modifications.

The HUPO Human Proteome Project (HPP) has two overall goals: (1) stepwise completion of the protein parts list-the draft human proteome including confidently identifying and characterizing at least one protein product from each protein-coding gene, with increasing emphasis on sequence variants, post-translational modifications (PTMs), and splice isoforms of those proteins; and (2) making proteomics an integrated counterpart to genomics throughout the biomedical and life sciences community. PeptideAtlas and GPMDB reanalyze all major human mass spectrometry data sets available through ProteomeXchange with standardized protocols and stringent quality filters; neXtProt curates and integrates mass spectrometry and other findings to present the most up to date authorative compendium of the human proteome. The HPP Guidelines for Mass Spectrometry Data Interpretation version 2.1 were applied to manuscripts submitted for this 2016 C-HPP-led special issue [ www.thehpp.org/guidelines ]. The Human Proteome presented as neXtProt version 2016-02 has 16,518 confident protein identifications (Protein Existence [PE] Level 1), up from 13,664 at 2012-12, 15,646 at 2013-09, and 16,491 at 2014-10. There are 485 proteins that would have been PE1 under the Guidelines v1.0 from 2012 but now have insufficient evidence due to the agreed-upon more stringent Guidelines v2.0 to reduce false positives. neXtProt and PeptideAtlas now both require two non-nested, uniquely mapping (proteotypic) peptides of at least 9 aa in length. There are 2,949 missing proteins (PE2+3+4) as the baseline for submissions for this fourth annual C-HPP special issue of Journal of Proteome Research. PeptideAtlas has 14,629 canonical (plus 1187 uncertain and 1755 redundant) entries. GPMDB has 16,190 EC4 entries, and the Human Protein Atlas has 10,475 entries with supportive evidence. neXtProt, PeptideAtlas, and GPMDB are rich resources of information about post-translational modifications (PTMs), single amino acid variants (SAAVSs), and splice isoforms. Meanwhile, the Biology- and Disease-driven (B/D)-HPP has created comprehensive SRM resources, generated popular protein lists to guide targeted proteomics assays for specific diseases, and launched an Early Career Researchers initiative.

[1]  David D. Shteynberg,et al.  State of the Human Proteome in 2014/2015 As Viewed through PeptideAtlas: Enhancing Accuracy and Coverage through the AtlasProphet. , 2015, Journal of proteome research.

[2]  Yang Zhang,et al.  Functional implications of structural predictions for alternative splice proteins expressed in Her2/neu-induced breast cancers. , 2011, Journal of proteome research.

[3]  Andrew I Su,et al.  Data-Driven Approach To Determine Popular Proteins for Targeted Proteomics Translation of Six Organ Systems. , 2016, Journal of proteome research.

[4]  C. Pineau,et al.  Human Spermatozoa as a Model for Detecting Missing Proteins in the Context of the Chromosome-Centric Human Proteome Project. , 2015, Journal of proteome research.

[5]  Andrew R. Jones,et al.  ProteomeXchange provides globally co-ordinated proteomics data submission and dissemination , 2014, Nature Biotechnology.

[6]  Henk W. P. van den Toorn,et al.  An Augmented Multiple-Protease-Based Human Phosphopeptide Atlas. , 2015, Cell reports.

[7]  U. Eckhard,et al.  The Human Dental Pulp Proteome and N-Terminome: Levering the Unexplored Potential of Semitryptic Peptides Enriched by TAILS to Identify Missing Proteins in the Human Proteome Project in Underexplored Tissues. , 2015, Journal of proteome research.

[8]  G. Omenn,et al.  A first step toward completion of a genome-wide characterization of the human proteome. , 2013, Journal of proteome research.

[9]  F. He,et al.  Deep Coverage Proteomics Identifies More Low-Abundance Missing Proteins in Human Testis Tissue with Q-Exactive HF Mass Spectrometer. , 2016, Journal of proteome research.

[10]  Lennart Martens,et al.  Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1. , 2016, Journal of proteome research.

[11]  Yang Zhang,et al.  Functional Networks of Highest-Connected Splice Isoforms: From The Chromosome 17 Human Proteome Project. , 2015, Journal of proteome research.

[12]  Yonghao Yu,et al.  Site-specific characterization of the Asp- and Glu-ADP-ribosylated proteome , 2013, Nature Methods.

[13]  Theodoros Goulas,et al.  LysargiNase mirrors trypsin for protein C-terminal and methylation-site identification , 2014, Nature Methods.

[14]  Quanhui Wang,et al.  Chromosome-8-coded proteome of Chinese Chromosome Proteome Data set (CCPD) 2.0 with partial immunohistochemical verifications. , 2014, Journal of proteome research.

[15]  A. Nesvizhskii,et al.  Metrics for the Human Proteome Project 2015: Progress on the Human Proteome and Guidelines for High-Confidence Protein Identification. , 2015, Journal of proteome research.

[16]  Tao Zhou,et al.  Insights into the lysine acetylproteome of human sperm. , 2014, Journal of proteomics.

[17]  Thibault Robin,et al.  Looking for Missing Proteins in the Proteome of Human Spermatozoa: An Update. , 2016, Journal of proteome research.

[18]  Cheng Chang,et al.  Omics evidence: single nucleotide variants transmissions on chromosome 20 in liver cancer cell lines. , 2014, Journal of proteome research.

[19]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[20]  Assessing Transcription Regulatory Elements To Evaluate the Expression Status of Missing Protein Genes on Chromosomes 11 and 19. , 2015, Journal of proteome research.

[21]  Cathy H. Wu,et al.  The Human Proteome Project: Current State and Future Direction , 2011, Molecular & Cellular Proteomics.

[22]  U. Eckhard,et al.  Protein Termini and Their Modifications Revealed by Positional Proteomics. , 2015, ACS chemical biology.

[23]  G. von Heijne,et al.  Tissue-based map of the human proteome , 2015, Science.

[24]  Luis Mendoza,et al.  Trans‐Proteomic Pipeline, a standardized data processing pipeline for large‐scale reproducible proteomics informatics , 2015, Proteomics. Clinical applications.

[25]  Eric W. Deutsch,et al.  PASSEL: The PeptideAtlas SRMexperiment library , 2012, Proteomics.

[26]  Anushya Muruganujan,et al.  PANTHER version 10: expanded protein families and functions, and analysis tools , 2015, Nucleic Acids Res..

[27]  Amos Bairoch,et al.  Metrics for the Human Proteome Project 2013-2014 and strategies for finding missing proteins. , 2014, Journal of proteome research.

[28]  Chris Sander,et al.  Human SRMAtlas: A Resource of Targeted Assays to Quantify the Complete Human Proteome , 2016, Cell.

[29]  David Fenyö,et al.  g2pDB: A Database Mapping Protein Post-Translational Modifications to Genomic Coordinates. , 2016, Journal of proteome research.

[30]  Philipp Bucher,et al.  CleanEx: a database of heterogeneous gene expression data based on a consistent gene nomenclature , 2004, Nucleic Acids Res..

[31]  Eric W Deutsch,et al.  State of the human proteome in 2013 as viewed through PeptideAtlas: comparing the kidney, urine, and plasma proteomes for the biology- and disease-driven Human Proteome Project. , 2014, Journal of proteome research.

[32]  Jens Nielsen,et al.  Transcriptomics resources of human tissues and organs , 2016, Molecular systems biology.

[33]  F. Corrales,et al.  Prediction of a missing protein expression map in the context of the human proteome project. , 2015, Journal of proteome research.

[34]  M. Mann,et al.  Ultradeep human phosphoproteome reveals a distinct regulatory nature of Tyr and Ser/Thr-based signaling. , 2014, Cell reports.

[35]  H. Zou,et al.  An enzyme assisted RP-RPLC approach for in-depth analysis of human liver phosphoproteome. , 2014, Journal of proteomics.

[36]  Qiaojun He,et al.  The HER2 inhibitor TAK165 Sensitizes Human Acute Myeloid Leukemia Cells to Retinoic Acid-Induced Myeloid Differentiation by activating MEK/ERK mediated RARα/STAT1 axis , 2016, Scientific Reports.

[37]  Miguel Pignatelli,et al.  Database: The Journal of Biological Databases and Curation , 2016 .

[38]  Alan Bridge,et al.  The UniProtKB guide to the human proteome , 2016, Database J. Biol. Databases Curation.

[39]  Tao Zhang,et al.  Tissue-Based Proteogenomics Reveals that Human Testis Endows Plentiful Missing Proteins. , 2015, Journal of proteome research.

[40]  José A. Dianes,et al.  2016 update of the PRIDE database and its related tools , 2016, Nucleic Acids Res..

[41]  R. Aebersold,et al.  A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.

[42]  Luis Mendoza,et al.  PASSEL: The PeptideAtlas SRMexperiment library , 2012, Proteomics.

[43]  C. Lindskog,et al.  The human testis-specific proteome defined by transcriptomics and antibody-based profiling. , 2014, Molecular human reproduction.

[44]  Sean J Humphrey,et al.  High-throughput phosphoproteomics reveals in vivo insulin signaling dynamics , 2015, Nature Biotechnology.

[45]  C. Overall,et al.  TopFIND, a knowledgebase linking protein termini with function , 2011, Nature Methods.

[46]  A. Bairoch,et al.  Missing Protein Landscape of Human Chromosomes 2 and 14: Progress and Current Status. , 2016, Journal of proteome research.

[47]  Yuanfang Guan,et al.  Genome-Wide Functional Annotation of Human Protein-Coding Splice Variants Using Multiple Instance Learning. , 2016, Journal of proteome research.

[48]  G. Omenn Plasma proteomics, the Human Proteome Project, and cancer-associated alternative splice variant proteins. , 2014, Biochimica et biophysica acta.