Mass spectra alignments and their significance

Mass Spectrometry has become one of the most popular analysis techniques in Genomics and Systems Biology. We investigate a general framework that allows the alignment (or matching) of any two mass spectra. In particular, we examine the alignment of a reference mass spectrum generated in silico from a database, with a measured sample mass spectrum. In this context, we assess the significance of alignment scores for character-specific cleavage experiments, such as tryptic digestion of amino acids. We present an efficient approach to estimate this significance, with runtime linear in the number of detected peaks. In this context, we investigate the probability that a random string over a weighted alphabet contains a substring of some given weight.

[1]  Daniela Bartels,et al.  Bioinformatics support for high-throughput proteomics. , 2003, Journal of biotechnology.

[2]  Pavel A. Pevzner,et al.  Mutation-Tolerant Protein Identification by Mass Spectrometry , 2000, J. Comput. Biol..

[3]  Sven Rahmann,et al.  Statistics of cleavage fragments in random weighted strings , 2005 .

[4]  T Aittokallio,et al.  Automated detection of differently expressed fragments in mRNA differential display. , 2001, Electrophoresis.

[5]  M. Karas,et al.  Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. , 1988, Analytical chemistry.

[6]  Vineet Bafna,et al.  SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database , 2001, ISMB.

[7]  Ting Chen,et al.  A Suboptimal Algorithm for De Novo Peptide Sequencing via Tandem Mass Spectrometry , 2003, J. Comput. Biol..

[8]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[9]  Robert Giegerich,et al.  RIFLE: Rapid Identification of Microorganisms by Fragment Length Evaluation , 1997, ISMB.

[10]  Sebastian Böcker,et al.  Sequencing from Compomers: Using Mass Spectrometry for DNA de novo Sequencing of 200+ nt , 2004, J. Comput. Biol..

[11]  R. Aebersold,et al.  Mass spectrometric approaches for the identification of gel‐separated proteins , 1995, Electrophoresis.

[12]  X. Huang,et al.  Dynamic programming algorithms for restriction map comparison , 1992, Comput. Appl. Biosci..

[13]  Jacques Colinge,et al.  A Systematic Statistical Analysis of Ion Trap Tandem Mass Spectra in View of Peptide Scoring , 2003, WABI.

[14]  J. Chambers,et al.  The New S Language , 1989 .

[15]  Sebastian Böcker,et al.  Sequencing from Compomers: Using Mass Spectrometry for DNA De-Novo Sequencing of 200+ nt , 2003, WABI.

[16]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[17]  W. C. Price,et al.  Collision spectroscopy , 1978, Nature.

[18]  P. Højrup,et al.  Rapid identification of proteins by peptide-mass fingerprinting , 1993, Current Biology.

[19]  Veli Mäkinen,et al.  Peak alignment using restricted edit distances. , 2007, Biomolecular engineering.

[20]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[21]  Carola Wenk Applying an Edit Distance to the Matching of Tree Ring Sequences in Dendrochronology , 1999, CPM.

[22]  Pavel A. Pevzner,et al.  De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[23]  I-Jeng Wang,et al.  A statistical model of proteolytic digestion , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[24]  Tero Aittokallio,et al.  Automated detection of differentially expressed fragments in mRNA differential display , 2001 .

[25]  Sebastian Böcker,et al.  Base-specific fragmentation of amplified 16S rRNA genes analyzed by mass spectrometry: A tool for rapid bacterial identification , 2002, Proceedings of the National Academy of Sciences of the United States of America.