Proteoforms as the next proteomics currency

Identifying precise molecular forms of proteins can improve our understanding of function Proteoforms—the different forms of proteins produced from the genome with a variety of sequence variations, splice isoforms, and myriad posttranslational modifications (1)—are critical elements in all biological systems (see the figure, left). Yang et al. (2) recently showed that the functions of proteins produced from splice variants from a given gene—different proteoforms—can be as different as those for proteins encoded by entirely different genes. Li et al. (3) showed that splice variants play a central role in modulating complex traits. However, the standard paradigm of proteomic analysis, the “bottom-up” strategy pioneered by Eng and Yates some 20 years ago (4), does not directly identify proteoforms. We argue that proteomic analysis needs to provide the identities and abundances of the proteoforms themselves, rather than just their peptide surrogates. Developing new proteome-wide strategies to accomplish this goal presents a formidable but not insurmountable technological challenge that will benefit the biomedical community.

[1]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[2]  Alexey I Nesvizhskii,et al.  Interpretation of Shotgun Proteomic Data , 2005, Molecular & Cellular Proteomics.

[3]  F. McLafferty,et al.  Extending Top-Down Mass Spectrometry to Proteins with Masses Greater Than 200 Kilodaltons , 2006, Science.

[4]  J. Brumbaugh,et al.  Mass spectrometry identifies and quantifies 74 unique histone H4 isoforms in differentiating human embryonic stem cells , 2008, Proceedings of the National Academy of Sciences.

[5]  B. Browning,et al.  Haplotype phasing: existing methods and new developments , 2011, Nature Reviews Genetics.

[6]  Richard D. LeDuc,et al.  Mapping Intact Protein Isoforms in Discovery Mode Using Top Down Proteomics , 2011, Nature.

[7]  Bing Zhang,et al.  Protein identification using customized protein sequence databases derived from RNA-Seq data. , 2012, Journal of proteome research.

[8]  D. Matthews,et al.  De novo derivation of proteomes from transcriptomes for transcript and protein identification , 2012, Nature Methods.

[9]  Lloyd M. Smith,et al.  Proteoform: a single term describing protein complexity , 2013, Nature Methods.

[10]  B. Garcia,et al.  Mass spectrometric analysis of histone proteoforms. , 2014, Annual review of analytical chemistry.

[11]  Anthony J. Cesnik,et al.  Elucidating Proteoform Families from Proteoform Intact-Mass and Lysine-Count Measurements , 2016, Journal of proteome research.

[12]  David A. Knowles,et al.  RNA splicing is a primary link between genetic variation and disease , 2016, Science.

[13]  Gloria M. Sheynkman,et al.  Widespread Expansion of Protein Interaction Capabilities by Alternative Splicing , 2016, Cell.

[14]  Lloyd M. Smith,et al.  How many human proteoforms are there? , 2018, Nature chemical biology.