Protein Sequencing with an Adaptive Genetic Algorithm from Tandem Mass Spectrometry

In Proteomics, only the de novo peptide sequencing approach allows a partial amino acid sequence of a peptide to be found from a MS/MS spectrum. In this article a preliminary work is presented to discover a complete protein sequence from spectral data (MS and MS/MS spectra). For the moment, our approach only uses MS spectra. A genetic algorithm (GA) has been designed with a new evaluation function which works directly with a complete MS spectrum as input and not with a mass list like the other methods using this kind of data. Thus the mono isotopic peak extraction step which needs a human intervention is deleted. The goal of this approach is to discover the sequence of unknown proteins and to allow a better understanding of the differences between experimental proteins and proteins from databases.

[1]  Pavel A. Pevzner,et al.  De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[2]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.

[3]  Bryant A. Julstrom,et al.  What Have You Done for Me Lately? Adapting Operator Probabilities in a Steady-State Genetic Algorithm , 1995, ICGA.

[4]  Richard D. Smith,et al.  Rapid Calculation of Isotope Distributions , 1995 .

[5]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Alejandro Heredia-Langner,et al.  Constrained de novo peptide identification via multi-objective optimization , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[7]  R D Appel,et al.  Improving protein identification from peptide mass fingerprinting through a parameterized multi‐level scoring algorithm and an optimized peak detection , 1999, Electrophoresis.

[8]  Tzung-Pei Hong,et al.  Simultaneously Applying Multiple Mutation Operators in Genetic Algorithms , 2000, J. Heuristics.

[9]  El-Ghazali Talbi,et al.  ParadisEO: A Framework for the Reusable Design of Parallel and Distributed Metaheuristics , 2004, J. Heuristics.

[10]  Flavio Monigatti,et al.  Algorithm for accurate similarity measurements of peptide mass fingerprints and its application , 2005, Journal of the American Society for Mass Spectrometry.

[11]  Christoph Menzel,et al.  OLAV-PMF: a novel scoring scheme for high-throughput peptide mass fingerprinting. , 2004, Journal of proteome research.

[12]  Vineet Bafna,et al.  SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database , 2001, ISMB.

[13]  Lawrence Davis,et al.  Adapting Operator Probabilities in Genetic Algorithms , 1989, ICGA.

[14]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[15]  Alejandro Heredia-Langner,et al.  Sequence optimization as an alternative to de novo analysis of tandem mass spectrometry data , 2004, Bioinform..

[16]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[17]  B. Searle,et al.  High-throughput identification of proteins and unanticipated sequence modifications using a mass-based alignment algorithm for MS/MS de novo sequencing results. , 2004, Analytical chemistry.