Applications of graph theory in protein structure identification

There is a growing interest in the identification of proteins on the proteome wide scale. Among different kinds of protein structure identification methods, graph-theoretic methods are very sharp ones. Due to their lower costs, higher effectiveness and many other advantages, they have drawn more and more researchers’ attention nowadays. Specifically, graph-theoretic methods have been widely used in homology identification, side-chain cluster identification, peptide sequencing and so on. This paper reviews several methods in solving protein structure identification problems using graph theory. We mainly introduce classical methods and mathematical models including homology modeling based on clique finding, identification of side-chain clusters in protein structures upon graph spectrum, and de novo peptide sequencing via tandem mass spectrometry using the spectrum graph model. In addition, concluding remarks and future priorities of each method are given.

[1]  David Martin,et al.  Computational Molecular Biology: An Algorithmic Approach , 2001 .

[2]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[3]  R. Sauer,et al.  Stable, monomeric variants of lambda Cro obtained by insertion of a designed beta-hairpin sequence. , 1990, Science.

[4]  J. Guss,et al.  Structure of oxidized poplar plastocyanin at 1.6 A resolution. , 1983, Journal of molecular biology.

[5]  W. R. Rays of Positive Electricity and their Application to Chemical Analysis , 1914, Nature.

[6]  D. Covell,et al.  A role for surface hydrophobicity in protein‐protein recognition , 1994, Protein science : a publication of the Protein Society.

[7]  S. Harrison,et al.  Structure of the represser–operator complex of bacteriophage 434 , 1987, Nature.

[8]  K. Stühler,et al.  Evaluation of algorithms for protein identification from sequence databases using mass spectrometry data , 2004, Proteomics.

[9]  A. Kolinski,et al.  Simulations of the Folding of a Globular Protein , 1990, Science.

[10]  J. A. Taylor,et al.  Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. , 1997, Rapid communications in mass spectrometry : RCM.

[11]  Milan Randic,et al.  On Unique Numbering of Atoms and Unique Codes for Molecular Graphs , 1975, J. Chem. Inf. Comput. Sci..

[12]  V. Wysocki,et al.  Mobile and localized protons: a framework for understanding peptide dissociation. , 2000, Journal of mass spectrometry : JMS.

[13]  Cyrus Chothia,et al.  Selecting buried residues , 1989, Nature.

[14]  K. Biemann,et al.  Computer program (SEQPEP) to aid in the interpretation of high-energy collision tandem mass spectra of peptides. , 1989, Biomedical & environmental mass spectrometry.

[15]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[16]  J. Kraut,et al.  Crystal structure of a complex between electron transfer partners, cytochrome c peroxidase and cytochrome c. , 1993, Science.

[17]  J. Thomson Bakerian Lecture:—Rays of positive electricity , 1913 .

[18]  David B. Searls,et al.  The Roots of Bioinformatics , 2010, PLoS Comput. Biol..

[19]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[20]  Ping Wang,et al.  Structural characterization of peptides via tandem mass spectrometry of their dilithiated monocations , 2006 .

[21]  R. March Quadrupole ion trap mass spectrometry: theory, simulation, recent developments and applications , 1998 .

[22]  S. Vishveshwara,et al.  Identification of side-chain clusters in protein structures by a graph spectral method. , 1999, Journal of molecular biology.

[23]  Jack Minker,et al.  An Analysis of Some Graph Theoretical Cluster Techniques , 1970, JACM.

[24]  S. L. Mayo,et al.  De novo protein design: fully automated sequence selection. , 1997, Science.

[25]  Lourdes Santana,et al.  Proteomics, networks and connectivity indices , 2008, Proteomics.

[26]  Alexander Pertsemlidis,et al.  Having a BLAST with bioinformatics (and avoiding BLASTphemy) , 2001, Genome Biology.

[27]  Ming-Yang Kao,et al.  A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry , 2000, SODA '00.

[28]  J. Pitt,et al.  Principles and applications of liquid chromatography-mass spectrometry in clinical biochemistry. , 2009, The Clinical biochemist. Reviews.

[29]  Eunok Paek,et al.  CIFTER: automated charge-state determination for peptide tandem mass spectra. , 2008, Analytical chemistry.

[30]  L. Chen,et al.  Structure of an electron transfer complex: methylamine dehydrogenase, amicyanin, and cytochrome c551i. , 1994, Science.

[31]  J. Greer Comparative modeling methods: Application to the family of the mammalian serine proteases , 1990, Proteins.

[32]  C. Ling,et al.  PeakSelect: preprocessing tandem mass spectra for better peptide identification. , 2008, Rapid communications in mass spectrometry : RCM.

[33]  C. D. Walter Algorithmics–The spirit of computing , 1988 .

[34]  A. Marshall,et al.  Fourier transform ion cyclotron resonance mass spectrometry: a primer. , 1998, Mass spectrometry reviews.

[35]  J. Moult,et al.  Determination of the conformation of folding initiation sites in proteins by computer simulation , 1995, Proteins.

[36]  R. S. Chen Monte Carlo simulations for the study of hemoglobin‐fragment conformations , 1989 .

[37]  Ting Chen,et al.  Algorithms for de novo peptide sequencing using tandem mass spectrometry , 2004 .

[38]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[39]  J. Moon,et al.  On cliques in graphs , 1965 .

[40]  Joseph John Thomson,et al.  Rays of positive electricity and their application to chemical analyses , 1913 .

[41]  K. Resing,et al.  Proteomics strategies for protein identification , 2005, FEBS letters.

[42]  J. Beynon,et al.  The use of the mass spectrometer for the identification of organic compounds , 1956 .

[43]  K. Biemann,et al.  Determination of the amino acid sequence in oligopeptides by computer interpretation of their high-resolution mass spectra. , 1966, Journal of the American Chemical Society.

[44]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[45]  Ting Chen,et al.  A Suboptimal Algorithm for De Novo Peptide Sequencing via Tandem Mass Spectrometry , 2003, J. Comput. Biol..

[46]  M. Brunori,et al.  Involvement of the hydrophobic patch of azurin in the electron-transfer reactions with cytochrome C551 and nitrite reductase. , 1990, European journal of biochemistry.

[47]  A. Fersht,et al.  Reversible dissociation of dimeric tyrosyl-tRNA synthetase by mutagenesis at the subunit interface. , 1985, Biochemistry.

[48]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[49]  Guilong Cheng,et al.  Mass spectrometry of peptides and proteins. , 2005, Methods.

[50]  K Biemann Mass spectrometry of peptides and proteins. , 1992, Annual review of biochemistry.

[51]  S. Wilson,et al.  Applications of simulated annealing to peptides , 1990, Biopolymers.

[52]  C Venclovas,et al.  Criteria for evaluating protein structures derived from comparative modeling , 1997, Proteins.

[53]  R. Aebersold,et al.  Mass spectrometry in proteomics. , 2001, Chemical reviews.

[54]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[55]  R. Abagyan,et al.  Biased probability Monte Carlo conformational searches and electrostatic calculations for peptides and proteins. , 1994, Journal of molecular biology.

[56]  Pavel A. Pevzner,et al.  De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[57]  Leo C. McHugh,et al.  Computational Methods for Protein Identification from Mass Spectrometry Data , 2008, PLoS Comput. Biol..

[58]  R Samudrala,et al.  A graph-theoretic algorithm for comparative modeling of protein structure. , 1998, Journal of molecular biology.

[59]  J. Futrell Development of tandem mass spectrometry: one perspective , 2000 .

[60]  Saraswathi Vishveshwara,et al.  Classification of polymer structures by graph theory , 1999 .

[61]  H A Scheraga,et al.  A possible folding pathway of bovine pancreatic RNase. , 1979, Proceedings of the National Academy of Sciences of the United States of America.

[62]  Frank Harary,et al.  Graph Theory , 2016 .

[63]  Kuo-Chen Chou,et al.  Energetics of interactions of regular structural elements in proteins , 1990 .

[64]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.

[65]  A. T. Ince,et al.  Communication. Noise sources in inductively coupled plasma mass spectrometry: an investigation of their importance to the precision of isotope ratio measurements , 1994 .

[66]  Rovshan G Sadygov,et al.  Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the book , 2004, Nature Methods.

[67]  Kenneth M. Hall An r-Dimensional Quadratic Placement Algorithm , 1970 .

[68]  Hugh M. Cartwright,et al.  msmsEval: tandem mass spectral quality assignment for high-throughput proteomics , 2007, BMC Bioinformatics.

[69]  Bo Yan,et al.  A graph-theoretic approach for the separation of b and y ions in tandem mass spectra , 2005, Bioinform..