From the human genome to the human proteome.

An ultimate goal in biology is to fully understand how a cell, or even a whole organism, works. Ideally, such knowledge might be used to develop models that predict the responses of cells to specific cues or diseases. A first step towards this goal is to identify and characterize all the molecular players present in cells. The Human Genome Project was probably one of the most ambitious scientific endeavors so far and provided the first essential pieces in this puzzle. The availability of the 3 billion base pairs that make up our DNA generated worldwide excitement as this information might lead to understanding the molecular mechanisms of human pathologies. However, soon thereafter the complexity embedded in our genetic code was already realized. A surprising finding was the low percentage of DNA (less than 2% of the genome) coding for proteins—roughly 20.000 human genes. However, recent analysis indicates that 80% of the human genome is functional and either transcribed, binding to regulatory proteins, or associated with other biochemical functions. Although genomic information is vital, it does not touch upon proteins, the main molecular effectors of cells. Every researcher will agree that the analysis of the proteome is of more relevance, but still this has been less exploited due to technical hurdles and by the fact that the proteome is inherently several magnitudes more complex. Whereas the genome is nearly identical in every cell of the human body and also relatively constant over the lifetime of an organism, the proteome of every cell is very different and changes dramatically over time (Figure 1). Notwithstanding these challenges, the field of proteomics has witnessed tremendous developments over the last decade, primarily through advances in mass spectrometry and bioinformatics, and is now somewhat coming up to par with genomics and transcriptomics technologies. This is evidenced by two recent reports in Nature from a German team led by Bernard K ster and a USA/India-based collaboration headed by Akhilesh Pandey, who independently initiated an unprecedented effort with the aim of identifying all the human proteins encoded in the genome. To this end, both laboratories performed extensive proteomic analyses onmore than 70 human tissues and body fluids and more than 150 cell lines. Although the two teams used a very similar MS-centric workflow, some differences exist between these two studies, especially in the depth of the analyses. While Pandey et al. performed around 2000 mass spectrometric (LC-MS) runs, K ster et al. carried out more than 6000 analyses and made use of another 10000 measurements publicly available in proteomic repositories. Assuming an average of two hours per run, the instrument time used to acquire these data would reach an astonishing number of 34000 h (4.3 years if only one mass spectrometer had been used). The analysis of all the data resulted in the identification of 946000 and 293000 nonredundant unique peptide sequences in K ster s and Pandey s studies, respectively. Strikingly, and despite the significant difference in depth, the two studies found evidence for a nearly identical number of protein-coding genes: 18097 (K ster) and 17294 (Pandey). Although a careful comparison of the two studies is still needed, a first conclusion can be drawn: the unequivocal existence of protein translation for 90–95% of the human genes. This is a highly relevant finding, as previously almost one-third of the human genes had been barely annotated, and there was no experimental evidence that they could lead to proteins. Another relevant discovery derived from these studies concerns the extent of alternative splicing in the generation of protein isoforms. It is clear that the number of genes does not correlate with the complexity of an organism (C. elegans for instance has 20500 genes) and it has been suggested that alternative splicing might increase the repertoire of functional proteins. However, these proteomic studies could only identify as many as 9000 of the 67000 isoforms annotated in Uniprot. Although some of these isoforms may produce only one unique peptide, decreasing the likelihood of observation by proteomics, these data could also support the idea that there is a dominant isoform per gene. Both studies confirmed the existence of a core proteome present in all tissues, made up of “housekeeping [*] Prof. Dr. A. J. R. Heck Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, Utrecht University Padualaan 8. 3584 CH Utrecht (The Netherlands) E-mail: a.j.r.heck@uu.nl

[1]  A. Heck,et al.  Next-generation proteomics: towards an integrative view of proteome dynamics , 2012, Nature Reviews Genetics.

[2]  E. Pennisi Genomics. ENCODE project writes eulogy for junk DNA. , 2012, Science.

[3]  M. Mann,et al.  Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast , 2008, Nature.

[4]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[5]  E. Lundberg,et al.  Towards a knowledge-based Human Protein Atlas , 2010, Nature Biotechnology.

[6]  John R Yates,et al.  Proteomics by mass spectrometry: approaches, advances, and applications. , 2009, Annual review of biomedical engineering.

[7]  Christopher T. Walsh,et al.  Posttranslationale Proteinmodifikation: die Chemie der Proteomdiversifizierung , 2005 .

[8]  Richard D. LeDuc,et al.  Mapping Intact Protein Isoforms in Discovery Mode Using Top Down Proteomics , 2011, Nature.

[9]  R. Aebersold,et al.  Mass spectrometry-based proteomics and network biology. , 2012, Annual review of biochemistry.

[10]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.

[11]  J. Harrow,et al.  Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene , 2013, Genome Biology.

[12]  M. Mann,et al.  Quantitative, high-resolution proteomics for data-driven systems biology. , 2011, Annual review of biochemistry.

[13]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[14]  Derek J. Bailey,et al.  The One Hour Yeast Proteome* , 2013, Molecular & Cellular Proteomics.

[15]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[16]  Sylvie Garneau-Tsodikova,et al.  Protein posttranslational modifications: the chemistry of proteome diversifications. , 2005, Angewandte Chemie.