Alignment of Mass Spectrometry Data by Clique Finding and Optimization

Mass spectrometry (MS) is becoming a popular approach for quantifying the protein composition of complex samples. A great challenge for comparative proteomic profiling is to match corresponding peptide features from different experiments to ensure that the same protein intensities are correctly identified. Multi-dimensional data acquisition from liquid-chromatography mass spectrometry (LC-MS) makes the alignment problem harder. We propose a general paradigm for aligning peptide features using a bounded error model. Our method is tolerant of imperfect measurements, missing peaks, and extraneous peaks. It can handle an arbitrary number of dimensions of separation, and is very fast in practice even for large data sets. Finally, its parameters are intuitive and we describe a heuristic for estimating them automatically.We demonstrate results on single- and multi-dimensional data.

[1]  Emanuel F Petricoin,et al.  Mass spectrometry-based diagnostics: the upcoming revolution in disease detection. , 2003, Clinical chemistry.

[2]  Hongyu Zhao,et al.  Multiple Peak Alignment in Sequential Data Analysis: A Scale-Space-Based Approach , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  B. Turnbull The Empirical Distribution Function with Arbitrarily Grouped, Censored, and Truncated Data , 1976 .

[4]  Marloes H. Maathuis,et al.  Reduction Algorithm for the NPMLE for the Distribution Function of Bivariate Interval-Censored Data , 2005, 0906.3215.

[5]  E. Diamandis Mass Spectrometry as a Diagnostic and a Cancer Biomarker Discovery Tool , 2004, Molecular & Cellular Proteomics.

[6]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[7]  David G. Stork,et al.  Pattern Classification , 1973 .

[8]  Ruedi Aebersold,et al.  A Software Suite for the Generation and Comparison of Peptide Arrays from Sets of Data Collected by Liquid Chromatography-Mass Spectrometry*S , 2005, Molecular & Cellular Proteomics.

[9]  R. Gentleman,et al.  Computational Algorithms for Censored-Data Problems Using Intersection Graphs , 2001 .

[10]  Jeffrey S. Morris,et al.  Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments , 2004, Bioinform..

[11]  D. Chan,et al.  Evaluation of serum protein profiling by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry for the detection of prostate cancer: I. Assessment of platform reproducibility. , 2005, Clinical chemistry.

[12]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[13]  C. D. Kuglin,et al.  The phase correlation image alignment method , 1975 .

[14]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[15]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[16]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[17]  Pei Wang,et al.  Bioinformatics Original Paper a Suite of Algorithms for the Comprehensive Analysis of Complex Protein Mixtures Using High-resolution Lc-ms , 2022 .