Constrained de novo peptide identification via multi-objective optimization

Summary form only given. Automatic de novo peptide identification from collision-induced dissociation tandem mass spectrometry data is made difficult by large plateaus in the fitness landscapes of scoring functions and the fuzzy nature of the constraints that is due to noise in the data. A framework is presented for combining different peptide identification methods within a parallel genetic algorithm. The distinctive feature of our approach, based on Pareto ranking, is that it can accommodate constraints and possibly conflicting scoring functions. We have also shown how population structure can significantly improve the wall clock time of a parallel peptide identification genetic algorithm while at the same time maintaining some exchange of information across local populations.

[1]  David L. Levine,et al.  Users guide to the PGAPack parallel genetic algorithm library , 1995 .

[2]  Kalyanmoy Deb,et al.  Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[3]  Vineet Bafna,et al.  SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database , 2001, ISMB.

[4]  B. Chait,et al.  ProFound: an expert system for protein identification using mass spectrometric peptide mapping information. , 2000, Analytical chemistry.

[5]  J. Yates,et al.  A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. , 2003, Analytical chemistry.

[6]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[7]  Carlos A. Coello Coello,et al.  An updated survey of GA-based multiobjective optimization techniques , 2000, CSUR.

[8]  G Padron,et al.  Automated interpretation of low‐energy collision‐induced dissociation spectra by SeqMS, a software aid for de novo sequencing by tandem mass spectrometry , 2000, Electrophoresis.

[9]  Patrick D. Surry,et al.  The COMOGA Method: Constrained Optimisation by Multi-Objective Genetic Algorithms , 1997 .

[10]  E. Spedicato Algorithms for continuous optimization : the state of the art , 1994 .

[11]  Grant Heffelfinger,et al.  A comparison between two massively parallel algorithms for Monte Carlo computer simulation: An investigation in the grand canonical ensemble , 1996, J. Comput. Chem..

[12]  Alejandro Heredia-Langner,et al.  Sequence optimization as an alternative to de novo analysis of tandem mass spectrometry data , 2004, Bioinform..

[13]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[14]  Jorge J. Moré,et al.  Impact of Partial Separability on Large-Scale Optimization , 1997, Comput. Optim. Appl..

[15]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[16]  Pavel A. Pevzner,et al.  De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[17]  Pavel A. Pevzner,et al.  Mutation-tolerant protein identification by mass-spectrometry , 2000, RECOMB '00.

[18]  N. Sherman,et al.  Protein Sequencing and Identification Using Tandem Mass Spectrometry: Kinter/Tandem Mass Spectrometry , 2000 .

[19]  K. Jarman,et al.  Improved peptide sequencing using isotope information inherent in tandem mass spectra. , 2003, Rapid communications in mass spectrometry : RCM.

[20]  Pablo Moscato,et al.  Memetic algorithms: a short introduction , 1999 .

[21]  J. A. Taylor,et al.  Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. , 2001, Analytical chemistry.

[22]  Nicholas I. M. Gould,et al.  Large-scale Nonlinear Constrained Optimization: a Current Survey , 1994 .

[23]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[24]  Patrick D. Surry,et al.  RPL2: A Language and Parallel Framework for Evolutionary Computing , 1994, PPSN.

[25]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[26]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1992, Artificial Intelligence.

[27]  J. A. Taylor,et al.  Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. , 1997, Rapid communications in mass spectrometry : RCM.

[28]  William Gropp,et al.  MPI-2: Extending the Message-Passing Interface , 1996, Euro-Par, Vol. I.

[29]  B. Chait,et al.  A statistical basis for testing the significance of mass spectrometric protein identification results. , 2000, Analytical chemistry.

[30]  J. Yates,et al.  An automated multidimensional protein identification technology for shotgun proteomics. , 2001, Analytical chemistry.

[31]  Grant S. Heffelfinger,et al.  A comparison between two massively parallel algorithms for Monte Carlo computer simulation: An investigation in the grand canonical ensemble , 1996 .

[32]  R. Aebersold,et al.  ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data , 2002, Proteomics.