A Framework for Mass Spectral Quality Assessment without Prior Information

It is well known that a majority of experimental spectra are of too poor quality to be interpreted by any automatic method. It wastes time to interpret these "un-interpretable" spectra. On the other hand, some spectra with high quality also cannot be interpreted by any automatic method, but maybe by manual checking. Therefore, it is worthwhile to develop a powerful filter that could eliminate those spectra with poor quality before any interpretation. This paper proposes a framework to assess the quality of tandem mass spectra without prior information. The proposed framework includes: (1) filtering noises from the experimental mass spectra; (2) extracting the peaks; (3) mapping each spectrum into a feature vector which describes the quality of experimental spectra; (4) classifying spectra into clusters by using an unsupervised classification method; (5) training earning a classifier using the cluster with the high quality spectra and the one with poor quality spectra; and (6) assessing all spectra by using the trained classifier. The proposed framework has been implemented and tested on two tandem mass spectra datasets acquired by ion trap mass spectrometers. Computational experiments illustrate that the method based on the proposed framework can eliminate majority of poor quality spectra while losing very minority of high quality spectra.

[1]  Pan Du,et al.  Bioinformatics Original Paper Improved Peak Detection in Mass Spectrum by Incorporating Continuous Wavelet Transform-based Pattern Matching , 2022 .

[2]  David Fenyö,et al.  RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimises protein identification, and archives data in a relational database , 2002, Proteomics.

[3]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[4]  Ilan Shimshoni,et al.  Mean shift based clustering in high dimensions: a texture classification example , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Marshall W. Bern,et al.  Automatic Quality Assessment of Peptide Tandem Mass Spectra , 2004, ISMB/ECCB.

[6]  Chi-Tsong Chen Digital signal processing : spectral computation and filter design , 2000 .

[7]  A. Nesvizhskii,et al.  Experimental protein mixture for validating tandem mass spectral analysis. , 2002, Omics : a journal of integrative biology.

[8]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[9]  Fang-Xiang Wu,et al.  Quality Assessment of Peptide Tandem Mass Spectra , 2006, IMSCCS.

[10]  Luc Vincent,et al.  Morphological grayscale reconstruction in image analysis: applications and efficient algorithms , 1993, IEEE Trans. Image Process..

[11]  N. Sherman,et al.  Protein Sequencing and Identification Using Tandem Mass Spectrometry: Kinter/Tandem Mass Spectrometry , 2000 .

[12]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[13]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[14]  Edmond J. Breen,et al.  Automatic Poisson peak harvesting for high throughput protein identification , 2000, Electrophoresis.

[15]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.