Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination *

Traditional “bottom-up” proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named “Peptide Tag Assembler.” As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99–100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence.

[1]  Baozhen Shan,et al.  Complete De Novo Assembly of Monoclonal Antibody Sequences , 2016, Scientific Reports.

[2]  G. Mills,et al.  An antibody to amphiregulin, an abundant growth factor in patients’ fluids, inhibits ovarian tumors , 2016, Oncogene.

[3]  José A. Dianes,et al.  2016 update of the PRIDE database and its related tools , 2015, Nucleic Acids Res..

[4]  Nikola Tolić,et al.  De Novo Sequencing of Peptides from Top-Down Tandem Mass Spectra. , 2015, Journal of proteome research.

[5]  P. Wang,et al.  Convenient and Precise Strategy for Mapping N-Glycosylation Sites Using Microwave-Assisted Acid Hydrolysis and Characteristic Ions Recognition. , 2015, Analytical chemistry.

[6]  Hongcheng Liu,et al.  In vitro and in vivo modifications of recombinant and human IgG antibodies , 2014, mAbs.

[7]  Liang Li,et al.  Microwave-assisted acid hydrolysis of proteins combined with peptide fractionation and mass spectrometry analysis for characterizing protein terminal sequences. , 2014, Journal of proteomics.

[8]  K. Clauser,et al.  Sequencing-grade de novo analysis of MS/MS triplets (CID/HCD/ETD) from overlapping peptides. , 2013, Journal of proteome research.

[9]  J. Yates,et al.  Protein analysis by shotgun/bottom-up proteomics. , 2013, Chemical reviews.

[10]  Y. Yarden,et al.  Inhibition of triple-negative breast cancer models by combinations of antibodies to EGFR , 2013, Proceedings of the National Academy of Sciences.

[11]  K. Clauser,et al.  Shotgun Protein Sequencing with Meta-contig Assembly* , 2012, Molecular & Cellular Proteomics.

[12]  Y. Yarden,et al.  A recombinant decoy comprising EGFR and ErbB-4 inhibits tumor growth and metastasis , 2011, Oncogene.

[13]  R. Zubarev,et al.  Novel Cysteine Tags for the Sequencing of Non-Tryptic Disulfide Peptides of Anurans: ESI-MS Study of Fragmentation Efficiency , 2011, Journal of the American Society for Mass Spectrometry.

[14]  Jens Allmer,et al.  Algorithms for the de novo sequencing of peptides from tandem mass spectra , 2011, Expert review of proteomics.

[15]  Liang Li,et al.  Reproducible microwave-assisted acid hydrolysis of proteins using a household microwave oven and its combination with LC-ESI MS/MS for mapping protein sequences and modifications , 2010, Journal of the American Society for Mass Spectrometry.

[16]  W. Lehmann,et al.  De novo sequencing of peptides by MS/MS , 2010, Proteomics.

[17]  John R Yates,et al.  Proteomics by mass spectrometry: approaches, advances, and applications. , 2009, Annual review of biomedical engineering.

[18]  P. Pevzner,et al.  Automated de novo protein sequencing of monoclonal antibodies , 2008, Nature Biotechnology.

[19]  Ying Zhang,et al.  Protein sequencing by mass analysis of polypeptide ladders after controlled protein hydrolysis , 2004, Nature Biotechnology.

[20]  M. Fountoulakis,et al.  Hydrolysis and amino acid composition analysis of proteins , 1998 .

[21]  L. D. Ward,et al.  Internal amino acid sequencing of proteins by in situ cyanogen bromide cleavage in polyacrylamide gels. , 1990, Biochemical and biophysical research communications.

[22]  M. Clark Albumin Structure, Function and Uses , 1978 .

[23]  A. Inglis,et al.  Hydrolysis of the peptide bond and amino acid modification with hydriodic acid. , 1971, Australian journal of biological sciences.

[24]  Nuno Bandeira,et al.  Shotgun Protein Sequencing : Assembly of Tandem Mass Spectra from Mixtures of Modified Proteins , 2007 .

[25]  M. Fountoulakis,et al.  Hydrolysis and amino acid composition of proteins. , 1998, Journal of chromatography. A.

[26]  M. Rothschild,et al.  Albumin: Structure, Function and Uses , 1977 .