Single-molecule protein sequencing through fingerprinting: computational assessment

Proteins are vital in all biological systems as they constitute the main structural and functional components of cells. Recent advances in mass spectrometry have brought the promise of complete proteomics by helping draft the human proteome. Yet, this commonly used protein sequencing technique has fundamental limitations in sensitivity. Here we propose a method for single-molecule (SM) protein sequencing. A major challenge lies in the fact that proteins are composed of 20 different amino acids, which demands 20 molecular reporters. We computationally demonstrate that it suffices to measure only two types of amino acids to identify proteins and suggest an experimental scheme using SM fluorescence. When achieved, this highly sensitive approach will result in a paradigm shift in proteomics, with major impact in the biological and medical sciences.

[1]  Edward M. Marcotte,et al.  A Theoretical Justification for Single Molecule Peptide Sequencing , 2014, bioRxiv.

[2]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.

[3]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[4]  H. Bayley,et al.  Single-molecule site-specific detection of protein phosphorylation with a nanopore , 2014, Nature Biotechnology.

[5]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[6]  Richard C. T. Lee,et al.  A new filtration method and a hybrid strategy for approximate string matching , 2013, Theor. Comput. Sci..

[7]  M. Akeson,et al.  Unfoldase-mediated protein translocation through an α-hemolysin nanopore , 2013, Nature Biotechnology.

[8]  J. Oh,et al.  Tandem mass spectrometric method for definitive localization of phosphorylation sites using bromine signature. , 2011, Analytical biochemistry.

[9]  H. Bayley,et al.  Continuous base identification for single-molecule nanopore DNA sequencing. , 2009, Nature nanotechnology.

[10]  S. Turner,et al.  Real-Time DNA Sequencing from Single Polymerase Molecules , 2009, Science.

[11]  S. Quake,et al.  Single-Molecule DNA Sequencing of a Viral Genome , 2008, Science.

[12]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[13]  Udi Manber,et al.  Fast text searching: allowing errors , 1992, CACM.

[14]  Esko Ukkonen,et al.  Approximate String Matching with q-grams and Maximal Matches , 1992, Theor. Comput. Sci..

[15]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[16]  S. Turner,et al.  Real-time DNA sequencing from single polymerase molecules. , 2010, Methods in enzymology.

[17]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .